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Abstract —We derive information-theoretic converses (i.e., 
lower bounds) for the minimum time required by any algorithm 
for distributed function computation over a network of point- 
to-point channels with finite capacity, where each node of the 
network initially has a random observation and aims to compute 
a common function of all observations to a given accuracy with a 
given confidence by exchanging messages with its neighbors. We 
obtain the lower bounds on computation time by examining the 
conditional mutual information between the actual function value 
and its estimate at an arbitrary node, given the observations in 
an arbitrary subset of nodes containing that node. The main 
contributions include: 1) A lower bound on the conditional 
mutual information via so-called small ball probabilities, which 
captures the dependence of the computation time on the joint 
distribution of the observations at the nodes, the structure of the 
function, and the accuracy requirement. For linear functions, the 
small ball probability can be expressed by Levy concentration 
functions of sums of independent random variables, for which 
tight estimates are available that lead to strict improvements over 
existing lower bounds on computation time. 2) An upper bound 
on the conditional mutual information via strong data processing 
inequalities, which complements and strengthens existing cutset- 
capacity upper bounds. 3) A multi-cutset analysis that quantifies 
the loss (dissipation) of the information needed for computation 
as it flows across a succession of cutsets in the network. This 
analysis is based on reducing a general network to a line network 
with bidirectional links and self-links, and the results highlight 
the dependence of the computation time on the diameter of the 
network, a fundamental parameter that is missing from most of 
the existing lower bounds on computation time. 

Index Terms —Distributed function computation, computation 
time, small ball probability, Levy concentration function, strong 
data processing inequality, cutset bound, multi-cutset analysis 

I. Introduction and preview of results 
A. Model and problem formulation 

The problem of distributed function computation arises in 
such applications as inference and learning in networks and 
consensus or coordination of multiple agents. Each node of 
the network has an initial random observation and aims to 
compute a common function of the observations of all the 
nodes by exchanging messages with its neighbors over discrete 
memoryless point-to-point channels and by performing local 
computations. A problem of theoretical and practical interest is 
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to determine the fundamental limits on the computation time, 
i.e., the minimum number of steps needed by any distributed 
computation algorithm to guarantee that, when the algorithm 
terminates, each node has an accurate estimate of the function 
value with high probability. 

Formally, a network consisting of nodes connected by 
point-to-point channels is represented by a directed graph 
G = (V,£), where V is a finite set of nodes and £ C V x V 
is a set of edges. Node u can send messages to node v only 
if (u,v) £ £. Accordingly, to each edge e £ £ we associate 
a discrete memoryless channel with finite input alphabet X e , 
finite output alphabet Y e , and stochastic transition law K e 
that specifies the transition probabilities K e (y e \x e ) for all 
(x e . y e ) £ X, x Y e . The channels corresponding to different 
edges are assumed to be independent. Initially, each node v has 
access to an observation given by a random variable (r.v.) W v 
taking values in some space W„. We assume that the joint 
probability law Pw of W = (W v ) v ^\> is known to all the 
nodes. Given a function / : Iluev Z, each node aims 

to estimate the value Z = f(W) via local communication and 
computation. For example, when / is given by the identity 
mapping Z = W, the goal of each node is to estimate the 
observations of all other nodes in the network. 

The operation of the network is synchronized, and takes 
place in discrete time. A T-step algorithm A is a collection 
of deterministic encoders (<p v ,t) and estimators (ip v ), for all 
v £ V and t £ {1,..., T}, given by mappings 

<Pv,t ■ W„ x Y*- 1 -A X„^, if v : W„ x Y^_ -A Z, 

where X.„^ = IIueAf^ x (»,«) and = ILeA/^ Y («M- 
Here, Af v< - = {u £ V : (u, v) £ £} and A f v -> = {u £ V : 

(v, u) £ £} are, respectively, the in-neighborhood and the out- 
neighborhood of node v. The algorithm operates as follows: at 
each step t, each node v computes X v<t = {X^ vl p )t ) ue j l j- v ^ — 
l Pv,t{W v ,Yf~ 1 ) £ X v ^, and then transmits each message 
X( v , u ),t along the edge (v,u) £ £. For each (u,v) € £, the 
received message Y^ u ^ t at each t is related to the transmitted 
message X( UtV \ t via the stochastic transition law K( u v \. At 
step T, each node v computes Z v = 'tp v (W v , Yj ) as an 
estimate of Z, where Y v t = (Y( u t)ueM v +- € Y„^_ for 
t £ {1,..., T}. 

Given a nonnegative distortion function d : Z x Z -A M + , 
we use the excess distortion probability P[d(Z, Z v ) > e] to 
quantify the computation fidelity of the algorithm at node v. 
A key fundamental limit of distributed function computation 
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is the (s, S)-computation time: 

T(e, 5) = inf j T £ N :3 a T-step algorithm A such that 

maxPT d(Z,Z v ) > el < 5 }. (1) 

vgv l j J 

If an algorithm A has the property that 

maxP \d(Z,Z v ) > el <5, 

v€V 

then we say that it achieves accuracy e with confidence 1 — 5. 
Thus, Tie, 5) is the minimum number of time steps needed 
by any algorithm to achieve accuracy e with confidence 1 — 5. 
The objective of this paper is to derive general lower bounds 
on Tie, 5) for arbitrary network topologies, discrete memory¬ 
less channel models, continuous or discrete observations, and 
functions /. 

Previously, this problem (for real-valued functions and 
quadratic distortion) has been studied by Ayaso et al. [1] 
and by Como and Dahleh [2] using information-theoretic 
techniques. This problem is also related to the study of com¬ 
munication complexity of distributed computing over noisy 
channels. In that context, Goyal et al. [3] studied the problem 
of computing Boolean functions in complete graphs, where 
each pair of nodes communicates over a pair of independent 
binary symmetric channels (BSCs), and obtained tight lower 
bounds on the number of serial broadcasts using an approach 
tailored to that special problem. The technique used in [3] 
has been extended to random planar networks by Dutta et 
al. [4]. Other related, but differently formulated, problems 
include communication complexity and information complex¬ 
ity in distributed computing over noiseless channels, surveyed 
in [5]; minimum communication rates for distributed com¬ 
puting [6]—[8], compression, or estimation based on infinite 
sequences of observations, surveyed in [9, Chap. 21]; and 
distributed computing in wireless networks, surveyed in [10]. 
Some achievability results for specific distributed function 
computation problems can be found in [1], [11]—[18]. 

B. Method of analysis and summary of main results 

Our analysis builds upon the information-theoretic frame¬ 
work proposed by Ayaso et al. [1] and Como and Dahleh [2], 
The underlying idea is rather natural and exploits a fundamen¬ 
tal trade-off between the minimal amount of information any 
good algorithm must necessarily extract about the function 
value Z when it terminates and the maximal amount of 
information any algorithm is able to obtain due to time and 
communication constraints. To be more precise, given any set 
of nodes S C V, let Ws = (W v ) v& s denote the vector of 
observations at all the nodes in S. The quantity that plays a 
key role in the analysis is the conditional mutual information 
I{Z\ Z v \Ws) between the actual function value Z and the 
estimate Z v at an arbitrary node v, given the observations in 
an arbitrary subset of nodes S containing v. 

Consider an arbitrary T-step algorithm A that achieves 
accuracy e with confidence 1 — 5. Then, as we show in 



Fig. 1: A four-node network with a cut defined by S = {2,3} 
and S c = {1,4}. The cutset £$ consists of edges (1,2) and 
(4, 3), marked in blue. 


Lemma 1 of Sec. II-A, this mutual information can be lower- 
bounded by 

I(Z-X\W S ) >(1-J) log - M«), P> 

where h 2 (5) = —51og5 — (1 — 5)log(l — 5) is the binary 
entropy function, and 

L(ws,e) = supP [d(Z,z) < e\W s = ws] 
zez 

= supP [d(f{W),z) <e\ W s = ws] 
ze z 

is the conditional small ball probability of Z = f(W) given 
Ws = ws- The conditional small ball probability quantifies 
the difficulty of localizing the value of Z = f(W) in a 
“distortion ball” of size e given partial knowledge about the 
value of W, namely Ws = ws- For example, as discussed in 
Sec. IV, when / is a linear function of the observations W, the 
conditional small ball probability can be expressed in terms 
of so-called Levy concentration functions [19], for which tight 
estimates are available under various regularity conditions. 

On the other hand, if Pi is a T-step algorithm, then the 
amount of information any node v has about Z once A 
terminates can be upper-bounded by a quantity that increases 
with T and also depends on the network topology and on the 
information transmission capabilities of the channels connect¬ 
ing the nodes. To quantify this amount of information, we 
consider a cut of the network, i.e., a partition of the set of 
nodes V into two disjoint subsets S and S c = V\S, such 
that v £ S. The underlying intuition is that any information 
that nodes in S receive about Ws<= must How across the 
edges from nodes in S c to nodes in S. The set of these 
edges, denoted by £s, is referred to as the cutset induced by 
S. Figure 1 illustrates these concepts on a simple four-node 
network. We then have the following upper bound [1], [2] (see 
also Lemma 2 in Sec. II-B): 

I(Z-Z V \W S )<TC S . (3) 

The quantity Cs , referred to as the cutset capacity , is the sum 
of the Shannon capacities of all the channels located on the 
edges in the cutset 8s■ Thus, if there exists a cut {S, S c ) with 
a small value of Cs, then the amount of information gained 
by the nodes in S about Z will also be small. Note that the 
cutset upper bound grows linearly with T. However, when the 
initial observations W are discrete, we also know that 

I{Z-Z V \W S ) < I(W S o-,Z v \W s ) < H(W S c\W s ) 
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where H{Ws<=\Ws) is the conditional entropy of Ws* given 
Ws, which does not depend on T. In fact, we sharpen this 
bound by showing in Lemma 5 in Sec. II-D that 

I{Z-Z V \W S )< {1-(1-w) T )H(W S o\Ws). (4) 


Here, rj v is defined as 


Vv = sup 


i(U;Y v ) 
I(U ; X v ) 


where the supremum is over all triples ( U. X v , Y v ) of r.v.’s, 
such that U takes values in an arbitrary alphabet, U —► X v — ► 
Y v is a Markov chain, X v takes values in X u< _, Y v takes values 
in Y vi -, and the conditional probability law P Vr | x„ is equal 
to the product of all the channels entering v. As we discuss in 
detail in Sec. II-C, this constant is related to so-called strong 
data processing inequalities (SDPIs) [20], and quantifies the 
information transmission capabilities of the channels entering 
v. When r] v < 1, the upper bound (4) is strictly smaller than 
H(Ws<=\Ws)- With the upper bound (4), we can strengthen 
the cutset bound to the following: 


I(Z- Z V \W S ) < min {TC S , (l - (1 - Th) T )H(W S c\W s )}. 

(5) 


Combining the bounds in (2) and (5), we conclude that, if 
there exists a T-step algorithm A that achieves accuracy e 
with confidence 1 — 5, then 


T > max 


1 


( (1 d)1 ° g E [L(W s ,e)} 



log (i H(Ws°\Ws) ((! d ) log E[L(rv s , E )] 
log(l - Vv) 



moreover, this inequality holds for all choices of S C V and 
v £ S. The precise statements of the resulting lower bounds 
on T(e,d) are given in Theorem 1 and Theorem 2. 

The lower bound in ( 6 ) accounts for the difficulty of 
estimating the value of Z = f(W) given only a subset of 
observations Ws through the small ball probability L(Ws,s), 
and for the communication bottlenecks in the network through 
the cutset capacity C$ and the constants r/,,. The presence of 
L(Ws,s) in the bound ensures the correct scaling of T(e, 6) in 
the high-accuracy limit e —► 0. In particular, when the function 
/ is real-valued and the probability distribution of Z = f(W) 
has a density, it is not hard to see that L(Ws , e) = 0(e), and 
therefore T(e, 5) grows without bound at the rate of log(l/e) 
as e —> 0. By contrast, the bounds of Ayaso et al. [1] saturate 
at a finite constant even when no computation error is allowed, 
i.e., when e = 0. Detailed comparison with existing bounds 
is given in Sec. IV, where we particularize our lower bounds 
to the computation of linear functions. Moreover, in certain 
cases our lower bound on T(s,5) tends to infinity in the 
high-confidence regime S —> 0. By contrast, existing lower 
bounds that rely on cutset capacity estimates remain bounded 
regardless of how small we make S. 

Throughout the paper, we provide several concrete examples 
that illustrate the tightness of the general lower bound in ( 6 ). 



(a) 



Fig. 2: A six-node network partitioned into three sets, 5j = 
{1,4}, S 2 = {2,5}, and S 3 = {3,6}. Here, V, = {1,4}, 
V 2 = {1,2,4,5}, and the cutsets £-p 1 = {(2,1), (2,4)}, 
£ V2 = {(3,2), ( 6 ,5)}, Sp f = {(1,5), (4,5)}, and £ V c = 
{(2,3), (5,6)} are disjoint. Observe that nodes in Si commu¬ 
nicate only with nodes in S 2 and <Si, nodes in S 2 communicate 
only with nodes in 6j . S 2 , S-^ and nodes in S 3 communicate 
only with nodes in S 2 , S 3 . The bidirected chain reduced from 
the network is shown on the right. 


In particular. Example 1 in Sec. II-E concerns the problem 
of computing the mod -2 sum of two independent Hernf}) 
random variables in a network of two nodes communicating 
over binary symmetric channels (BSCs). For that problem, we 
obtain a lower bound on T(0,5) that matches an achievable 
upper bound within a factor of 2. In Example 2 in Sec. II-E, 
we consider the case where the nodes aim to distribute their 
discrete observations to all other nodes, and obtain a lower 
bound on X^O, 5) that captures the conductance of the network, 
which plays a prominent role in the previously published 
bounds of Ayaso et al. [1]. In Sec. V, we study two more 
examples: computing a sum of independent Rademacher ran¬ 
dom variables in a dumbbell network of BSCs, and distributed 
averaging of real-valued observations in an arbitrary network 
of binary erasure channels (BECs). Our lower bound for the 
former example can precisely capture the dependence of the 
computation time on the number of nodes in the network, 
while for the latter example it captures the correct dependence 
of the computation time on the accuracy parameter s. 

A significant limitation of the analysis based on a single cut 
( S , S r ) of the network is that it only captures the flow of infor¬ 
mation across the cutset £$, but does not account for the time 
it takes the algorithm to disseminate this information to all 
the nodes in S. We address this limitation in Sec. Ill through 
a multi-cutset analysis. The main idea is to partition the set 
of nodes V into several subsets <Si,..., S „, such that, for all 
Vi = 6>iU.. .USi, the cutsets £p 1 ,... ,£p„_ 1 , 
are disjoint, and to analyze the flow of information across 
this sequence of cutsets. Once such a partition is selected, the 
analysis is based on a network reduction argument (Lemma 7), 
which lumps all the nodes in each Si into a single virtual 
“supernode.” The construction of the partition ensures that 
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each supernode i only communicates with supernodes i — 1 
and i + 1, and can also send noisy messages to itself (this 
is needed to simulate noisy communication among the nodes 
within Si in the original network). Thus, the reduced network 
takes the form of a chain with n nodes communicating with 
their nearest neighbors over bidirectional noisy links and, in 
addition, sending noisy messages to themselves. We refer to 
this network as a bidirected chain of length n — 1. Figure 2a 
shows the partition of a six-node network, and the bidirected 
chain reduced from this network is shown in Fig. 2b. 

Once this reduction is carried out, we can convert any '/'-step 
algorithm A running on the original network into a randomized 
T-step algorithm A! running on the reduced network with the 
same accuracy and confidence guarantees as A. Consequently, 
it suffices to analyze distributed function computation in 
bidirected chains. The key quantitative statement that emerges 
from this analysis can be informally stated as follows: For 
any bidirected chain with n > 3 nodes, there exists a constant 
r\ £ [0,1] that plays the same role as r/ v in (4) and quantifies 
the information transmission capabilities of the channels in 
the chain, such that, for any algorithm A that runs on this 
chain and takes time T = 0(n/rf), the conditional mutual 
information between the function value Z and its estimate Z n 
at the rightmost node n given the observations of nodes 2 
through n is upper-bounded by 

I{Z- Z n \W 2:n ) = O ^ (1 ^ 2 . e - 2 "’) 2 j , (7) 

where Gn 2 ) is the Shannon capacity of the channel from node 
1 to node 2. The precise statement is given in Lemma 8 in 
Sec. III-A. Intuitively, this shows that, unless the algorithm 
uses 12 ( 71 / 77 ) steps, the information about W\ will dissipate 
at an exponential rate by the time it propagates through the 
chain from node 1 to node n. Combining (7) with the lower 
bound on I(Z; Z n \W 2:n ) based on small ball probabilities, 
we can obtain lower bounds on the computation time T(e, 6)- 
The precise statement is given in Theorem 3. Moreover, as 
we show, it is always possible to reduce an arbitrary network 
with bidirectional point-to-point channels between the nodes 
to a bidirected chain whose length is equal to the diameter 
of the original network, which implies that, for networks with 
sufficiently large diameter, and for sufficiently small values of 
e, < 5 , 

where diam(G) denotes the diameter. This dependence on 
diam(G), which cannot be captured by the single-cutset anal¬ 
ysis, is missing in almost all of the existing lower bounds on 
computation time. An exception is the paper by Rajagopalan 
and Schulman [13] that gives an asymptotic lower bound 
on the time required to broadcast a single bit over a chain 
of unidirectional BSCs. Our multi-cutset analysis applies to 
both discrete and continuous observations, and to general 
network topologies. It can be straightforwardly particularized 
to specific networks, such as bidirected chains, rings, trees, 
and grids, as discussed in Sec. III-B. We note that techniques 
involving multiple (though not necessarily disjoint) cutsets 


have also been proposed in the study of multi-party com¬ 
munication complexity by Tiwari [21] and more recently by 
Chattopadhyay et al. [22], while our concern is the influence 
of network topology and channel noise on the computation 
time. 

C. Organization of the paper 

The remainder of the paper is structured as follows. We start 
with the single-cutset analysis in Sec. II. The lower bound 
on the conditional mutual information via the conditional 
small ball probability is presented in Sec. II-A. The cutset 
upper bound and the SDPI upper bound on the conditional 
mutual information are presented in Sec. II-B and Sec. II-D. 
An introduction on SDPIs given in Sec. II-C. The lower 
bound on computation time is given in Sec. II-E, along with 
two concrete examples. Sec. Ill is devoted to the multi¬ 
cutset analysis, where we first present the network reduction 
argument in Sec. III-A, then derive general lower bounds 
on computation time and particularize the results to special 
networks in Sec. III-B. In Sec. IV, we discuss lower bounds 
for computing linear functions, where we relate the conditional 
small ball probability to Levy concentration functions, and 
evaluate them in a number of special cases. We also make 
detailed comparisons of our results with existing lower bounds 
in Sec. IV-D. In Sec. V, we compare the lower bounds on 
computation time with the achievable upper bounds for two 
more examples: computing a sum of independent Rademacher 
random variables in a dumbbell network of BSCs, and dis¬ 
tributed averaging of real-valued observations in an arbitrary 
network of binary erasure channels (BECs). We conclude this 
paper and point out future research directions in Sec. VI. A 
couple of lengthy technical proofs are relegated to a series of 
appendices. 

II. Single-cutset analysis 

We start by deriving information-theoretic lower bounds on 
the computation time T(e,S) based on a single cutset in the 
network. Recall that a cutset associated to a partition of V into 
two disjoint sets S and S c = V \ S consists of all edges that 
connect a node in S c to a node in S: 

£ s = {{u,v) £ £ : u £ S c ,v £ S} = (S c xS)n£. 

When S is a singleton, i.e., S = {u}, we will write £ v 
instead of the more clunky £{„}. As the discussion in Sec. I-B 
indicates, our analysis revolves around the conditional mutual 
information I(Z: Z v \Ws) for an arbitrary set of nodes S C V 
and for an arbitrary node v £ S. The lower bound on 
I(Z; Z v \Wg) expresses quantitatively the intuition that any 
algorithm that achieves 

maxP[d(^, Z v ) > el < S 

must necessarily extract a sufficient amount of information 
about the value of Z = f(W) = f(Ws, Ws<=). On the other 
hand, the upper bounds on I(Z; Z V \W$) formalize the idea 
that this amount cannot be too large, since any information 
that nodes in S receive about Ws<= must flow across the edges 
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in the cutset 8s (cf. [23, Sec. 15.10] for a typical illustration 
of this type of cutset arguments). We capture this information 
limitation in two ways: via channel capacity and via SDPI 
constants. 

The remainder of this section is organized as follows. We 
first present conditional mutual information lower bounds in 
Sec. II-A. Then we state the upper bound based on cutset 
capacity in Sec. II-B. After a brief detour to introduce the 
SDPIs in Sec. II-C, we state the SDPI-based upper bounds in 
Sec. II-D. Finally, we combine the lower and upper bounds to 
derive lower bounds on T(e, 6) in Sec. II-E. 


A. Lower bound on I(Z ; Z v \Ws) 

For any s > 0, S C V, and ws £ define the 

conditional small ball probability of Z given Ws = Ws as 

L(ws,e) = supP [d(Z,z) < e\W s = w s ], (9) 
26 z 

This quantity measures how well the conditional distribution 
of Z given Ws = ws concentrates in a small region of 
size e as measured by d(-, ■). The following lower bound on 
I(Z; Z v | Ws ) in terms of the conditional small ball probability 
is essential for proving lower bounds on T(e, S). 

Lemma 1. If an algorithm A achieves 

maxP[c?(Z, Z v ) > e] < 5 < 1/2, (10) 

then for any set S C V and any node v £ S, 

m U Vs) log E[L( /^ e)| - h 2 (S), (11) 

where h 2 (S) = — <51og(i — (1 — <5)log(l — 5) is the binary 
entropy function. 


Proof: Fix an arbitrary S C V and an arbitrary v £ S. 
Consider the probability distributions P = P^. z g and Q = 
Piy 5 ®Pz\w s ®P 2 \w s ' Define the indicator random variable 
T = 1 {d(Z, Z v ) < e}. Then from (10) it follows that P[T = 
1] > 1 — 5. On the other hand, since Z —» Ws —► Z v form a 
Markov chain under Q, by Fubini’s theorem. 


Q[T = 1] 

[ I l{d(z,z v ) < £}P(d,z|tn 5 )P(d 2 „|w 5 )P(dw 5 ) 
/ w 5 J z J z 

, ‘[d(Z,z v ) < s\w s = i 05 ]P(d 2 '„|to 5 )P(dt 05 ) 


/W 5 J Z 


< / sup P [d(Z,z v ) < e\ Ws = W 5 ]P(dw 5 ) 

J\l\l s 2..6Z 
= E [L(W s ,e)\. 


( 12 ) 


Consequently, 

I(Z-Z V \W S ) = D(P\\Q) 


> d 2 (P[T = l]||Q[T = l]) 

> P[T = 1] log = yj - 

( ^ )(1 ^ )l0g ^(ik^ 


ft 2 (P[T 

h 2 (S) 


1 ]) 


where 

(a) follows from the data processing inequality for diver¬ 
gence, where d 2 (p\\q) = plog(p/q) + (1 -p)log((l - 
p)/(l — q)) is the binary divergence function; 

(b) follows from the fact that d 2 (p\\q) > p\og{l/q) — h 2 {p)\ 

(c) follows from the fact that P[T = 1] > 1 — <5 > 1/2 by 
(10), and Q[T = 1] < E [L(W s ,e)\ by (12). 


For a fixed e, Lemma 1 captures the intuition that, the 
more spread the conditional distribution Pz\w s th e more 
information we need about Z to achieve the required accu¬ 
racy; similarly, for a fixed Pz\w S ’ lh e smaller the accuracy 
parameter e, the more information is necessary. In Section IV, 
we provide explicit expressions and upper bounds for the 
conditional small ball probability L(e,ws) in the context of 
computing linear functions of real-valued r.v.’s with absolutely 
continuous probability distributions. We show that, in such 
cases, L(e,ws) = O(e), which implies that the lower bound 
of Lemma 1 grows at least as fast as log(l/e) in the high- 
accuracy limit e —> 0. 


B. Upper bound on I(Z ; Z v \Ws) via cutset capacity 

Our first upper bound involves the cutset capacity Cs , 
defined as 


Cs = Y, 

e6 £s 

Here, C e denotes the Shannon capacity of the channel K e . 

Lemma 2. For any set S C V, let Zs — (Z v ) ve s- Then, for 
any T-step algorithm A and for any v £ S, 

I(Z-Z V \W S ) < I(Z;Z S \W S ) < TC's- 


Proof: The first inequality follows from the data process¬ 
ing lemma for mutual information. The second inequality has 
been obtained in [1] and [2] as well, but the proof in [1] relies 
heavily on differential entropy. Our proof is more general, as 
it only uses the properties of mutual information. 

Lor a set of nodes S C V, let Xs,t — (Xv,t)ves an d 
Ys,t = {Yv,t)veS- For two subsets and S 2 of V, define 
^(Si,s 2 ),i — {X( u , v ),t '■ u € 5i,i; £ S 2 ,(v,u ) £ 8) as the 
messages sent from nodes in <S| to nodes in S 2 at step t, and 
Y(s 1 ,s 2 ),t — (Y(u, v ),t : u £ Si,v £ S 2 , (u, v) £ 8) as the 
messages received by nodes in S 2 from nodes in S± at step 
t. We will be using this notation in the proofs that follow, as 
well. 

If T = 0, then for any v £ S, Z v = 'tp v (W v ), hence 
I{Z-Z S \W S ) < I{Z-Ws\W s ) = 0. Lor T > 1, we start 
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with the following chain of inequalities: 

I(Z-Z s \Ws) < I(Ws,W S c-,W s ,Yj\W s ) 

= 7(Ws=;y/|Ws) 

T 

= J2 I ^ W ^ Y sMS:Yt 1 ) 

t= 1 
T 

= 53 /(Ws. ; y Sit | Ws, yj- 1 , x 5 , t ) 

t=l 

T 

< $3 J ( ^y s> t i Ws, yj _1 , A5, t ) 

t=i 

T 

= 53 (/(^.tiys.tiws,^- 1 , x s , t ) 

t=i 

+ I{Ws°-,Ys,t\Ws, yj- 1 , A 5l t, X 5 c, t )) 

T 

= 53/(A 5v; y 5it |fy 5 ,yj- 1 ,x 5it ) 

t=l 

(d)^L 

< 53/(x S c it; y Sit |x Sit ) (13) 

t=l 


where 

(a) follows from data processing inequality, and the fact that 
Z = f(W s ,W s =) and = 1> V (W V ,Y?)\ 

(b) follows from the fact that X„ t = <£> Ul t(W„, Y* -1 ); 

(c) follows from the memorylessness of the channels, hence 
the Markov chain Wsc,Ws,Y| _1 —> Xgf. X$<p t —> 
T$,t> and the weak union property of conditional inde¬ 
pendence [24, p. 25]; 

(d) follows from the Markov chain 


Ws.yj- 1 -+x s ,uXs°,t 


Ys 


together with the fact that, if X —> A, B —> C form a 
Markov chain, then 


I{A-C\X,B) < I(A; C\B). 


To prove this, we expand I (A, X; C\ B) in two ways 
to get 


I {A, X; C\B) = I(X; C\B) + I(A;C\ X, B ) 
= I(A-C\B) + I{X-C\A,B). 


The claim follows because I (X;C\ A, B) = 0. 

From now on we drop the step index t and denote X($, ,s 2 ).t as 
s 2 simplify the notation. Note that X$ = (Xss, Xss<=) 


and Y s = (Y ss ,Y S o S ). We have 

I(X S c-,Ys\X s ) = I(X S c-,Yscs,Yss\X s ) 

= I(X S =; Y S o S \X s ) + I(X S c; F 55 1, Y S o S ) 

== I{Xs<=s,Xs<=s<=',Ysos\Xs) 

= I{Xs<=s'i ys c s|A's) 

+1 {Xs°s° ; ^s c 51 X$ , Xgcg ) 

00 , 

< I(Xgos;Yso S ) 

(«0 x , 

< 53 0 4 ) 

ee£s 

where 

(a) follows from the Markov chain X$c. Ygc S —> X$ —> 
Yss and the weak union property of conditional inde¬ 
pendence; 

(b) follows from the Markov chains Xg —► Xg, s —>■ Yg, s 
and Xgo S a : Xs —> Xs<=s —> Y$r Sl and the weak union 
property of conditional independence; 

(c) follows from the fact that the channels associated with 
8s are independent, and the fact that the capacity of a 
product channel is at most the sum of the capacities of 
the constituent channels [25]. 

Then the statement of Lemma 2 follows from (13) and (14). 


C. Preliminaries on strong data processing inequalities 

In Sec. II-D, we will upper-bound I(Z\ Z v \Ws) using so- 
called strong data processing inequalities (SDPI’s) for discrete 
channels (cf. [20] and references therein). Here we provide 
the necessary background. A discrete memoryless channel is 
specified by a triple (X, Y,/\), where X is the input alphabet, 
Y is the output alphabet, and I\ = (K(y\ x )) y ^ eXxY 
is the stochastic transition law. We say that the channel 
(X, Y, K ) satisfies an SDPI at input distribution Px with 
constant c £ [0,1) if _D(Qy||Py) < cD(Qa'||Px) for any 
other input distribution Qx- Here Py and Qy denote the 
marginal distribution of the channel output when the input 
has distribution Pa and Qx, respectively. Define the SDPI 
constant of K as 


r)(K ) = sup sup 


D(Qy||Py) 


r x q x ^ x D(Q x \\PxY 


The SDPI constants of some common discrete channels have 
closed form expressions. For example, for a binary symmetric 
channel (BSC) with crossover probability p, ? 7 (BSC(p)) = 
(1 — 2p) 2 [26], and for a binary erasure channel (BEC) with 
erasure probability p, p (BEC (p)) = 1 — p. It can be shown 
that p(K) is also the maximum mutual information contraction 
ratio in a Markov chain U —► X —> Y with Py|A = K [27]: 


/T ^ i(u ; y) 

p(K) = sup 

P UlX -'(C, A) 


(see [28, App. B] for a proof of this formula in the setting of 
abstract alphabets). Consequently, for any such Markov chain. 


I{U-,Y)<n(K)I(U-,X). 
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This is a stronger result than the ordinary data processing 
inequality for mutual information, as it quantitatively captures 
the amount by which the information contracts after passing 
through a channel. We will also need a conditional version of 
the SDPI: 

Lemma 3. For any Markov chain U,V —► X —► Y with 
Py|.Y = K, 

I(U-,Y\V)< V (K)I(U-,X\V). 

For binary channels, this result was first proved by Evans and 
Schulman [29, Corollary 1], A proof for the general case is 
included in [30, Lemma 2.7]. Finally, we will need a bound 
on the SDPI constant of a product channel. The tensor product 
of two channels (X-| , Y- 2 , ) and (X 2 , Y 2 , K > ) is a channel 

(Xi x X 2 ,Yi x Y 2 , A'i (g) K 2 ) with 

K x <g> K 2 (yx,y 2 \xi,x 2 ) = K 1 (y 1 \x 1 )K 2 (y 2 \x 2 ) 

for all (xi,x 2 ) £ Xi x X 2 , (y -[, y 2 ) S Yj x Y 2 . The extension 
to more than two channels is obvious. The following lemma 
is a special case of Corollary 2 of Polyanskiy and Wu [31], 
obtained using the method of Evans and Schulman [29], We 
give the proof, since we adapt the underlying technique at 
several points in this paper. 

Lemma 4. For a product channel K = Ki, if the 

constituent channels satisfy p{Ki) < p for i £ {1 

then 

V(K) < 1 - (1 - p) m . 

Proof: Let X and Y m be the input and output of the 
product channel K = K\ 0 ... 0 K. m . Let U be an arbitrary 
random variable, such that U —> X rn —> Y rn form a Markov 
chain. It suffices to show that 

I(U;Y m ) < (1 - (1 - p) m )l(U;X m ). (15) 

From the chain rule, 

I{U■ Y m ) = I(U ; F m_1 ) + I(U ; Y m |Y m_1 ). 

Since U, Y m_1 —)• X m —> Y m form a Markov chain, and 
Py m |x TO = K m , Lemma 3 gives 

/([/;< p{K m )I(U ;A m |Y' m_1 ) 
<pI{U-,X m \Y m ~ l ). 

It follows that 

I(U; Y m ) < I(U ; Y m_1 ) + pI(U ; X m | Y” 1 " 1 ) 

= (1 - Y m - X ) + pI(U ; Y m_1 , A m ) 

where the last step follows from the ordinary data processing 
inequality and the Markov chain U —> X rn — Y m ~ 1 ,X m . 
Unrolling the above recursive upper bound on I(U;Y m ) and 
noting that /([/; Yi) < pI(U;Xi), we get 

I(U-, Y m ) < (1 - X x ) + ... 

+ (1 -p)pI{U-,X m ~ l ) + pI{U-X m ) 

< ((1 - r,)™- 1 + ... + (1 - r,) + 1 )yI{U- X m ) 
= (l-(l-r,r)l(U-,X m ), 

which proves (15) and hence Lemma 4. ■ 


D. Upper bound on I(Z ; Z v \Ws) via SDPI 

Having the necessary background at hand, we can now 
state our upper bounds based on SDPI constants. Let K v = 
K e be the overall transition law of the channels across 
the cutset E v . Define 

r) v = t](K v ) 

as the SDPI constant of K v , and 

p* v = maxr](K e ) 

e€c v 

as the largest SDPI constant among all the channels across £ v . 
Our second upper bound on /(A; Z v \Ws) involves these SDPI 
constants, and the conditional entropy of IT^c given Ws- 

Lemma 5. For any set S C V, any node v £ S, and any 
T-step algorithm A, 

I{Z- Z V \W S ) < (1 - (1 - t] v ) t )H{W S o\W s ) 

< (1 - (1 - Vv) l£vlT ) H (W S o\W s ). 

Proof: We adapt the proof of Lemma 4. For any v and 
t, define the shorthand X v< _ t = X:_n/ v< _ v y t . If T = 0, 
then for any v £ S, Z v = ip v (W v )', hence I(Z; Z v \Ws) < 
I(Z: W V \W S ) = 0. If T > 1 , then for any v £ S, 

I(Z-Z V \W S ) 

< I(W s ,W S c-,W v ,Y v t \W s ) 

= I(W S '-,Y?\Ws) 

= I(W s ^Yj~ l \W s ) + IiWsuYy^Ws^Yj- 1 ) 

< I{Ws*\Y?- x \Ws) +%/(W 50 X^ t |W 5 ,Y i; t - 1 ) 

= (l-r lv )I(Ws«;Y v T - 1 \Ws ) 

+ rivI(Wsc-,Y l T - 1 ,X v ^, T \W s ) 

where (a) follows from the conditional SDPI (Lemma 3) 
and the fact that Wsc,Ws,Yj _1 —> X vi _j Y Vyt form 
a Markov chain for t £ {1,...,T}. Unrolling the above 
recursive upper bound on I(Wso',Yj\Ws), and noting that 
I{W s *;Y v>1 \Ws) < p v I{W s ^X v ^x\W s ), we get 

I{WsuYj\W s ) 

< (1 - Vvf^VvHWso^X^^Ws) + ... 

+ (1 - yv)VvI(W S c; Yj- 2 ,X v ^ T _x\Ws) 

+ Vv I(Wso-,Y v t ~\X v ^ t \Ws) 

< ((1 - Vvf- 1 + ■ • • + (1 - p v ) + l) Vv H(W S c\Ws) 

= {l-(l-y v ) T )H(W S c\W s ). 

The weakened upper bound follows from the fact that r) v < 
1 — (1 — d ue to Lemma 4. This completes the proof 

of Lemma 5. ■ 

Comparing Lemma 2 and Lemma 5, we note that the upper 
bound in Lemma 2 captures the communication constraints 
through the cutset capacity alone, in accordance with the 
fact that the communication constraints do not depend on 
W or Z. The bound applies when W is either discrete or 
continuous; however, it grows linearly with T. By contrast, the 
upper bound in Lemma 5 builds on the fact that I[Z\ Z v \Ws) 


is upper bounded by H(Ws<=\Ws), and goes a step fur¬ 
ther by capturing the communication constraint through a 
multiplicative contraction of H{Ws<=\Ws). It never exceeds 
H(Ws<=\Ws) as T increases. However, it is useful only when 
the conditional entropy H{Ws<=\Ws) is well-defined and finite 
(e.g., when W is discrete). We give an explicit comparison of 
Lemma 2 and Lemma 5 in the following example: 

Example 1. Consider a two-node network, where the nodes 
are connected by BSCs. The problem is for the two nodes 
to compute the mod-2 sum of their one-bit observations. 
Formally, we have G = (V,£) with V = {1,2}, 8 = 
{(1,2), (2,1)}, K {12) = K( 2 ,i) = BSC(p), W, and W 2 are 
independent Bern(l) r.v.’s, Z = W\ © W 2 , and d(z,z) = 

1 {z / z}. 

Choosing S = {2}, Lemma 2 gives 

I(Z;Z 2 \W 2 )<(l-h 2 (p))T, (16) 


whereas Lemma 5, together with the fact that r/(BSC(p)) = 
(1 — 2 p) 2 , gives 

J(Z;Z 2 |W 2 )<l-(4pp) T , (17) 


where, for p £ [0,1], p = 1 — p. For this example, the cutset- 
capacity upper bound is always tighter for small T, as 


<9(1 - (4 pp) T ) 


dT 


= iog—>i -h 2 {p), pe[o,i], 
t= o 4 pp 


Fig. 3 shows the two upper bounds with p = 0.3: the cutset- 
capacity upper bound is tighter when T < 5. 



Fig. 3: Comparison of upper bounds in Lemma 2 and Lemma 5 
for computing mod -2 sum in a two-node network. 


E. Lower bounds on computation time 

We now proceed to derive lower bounds on the computation 
time T(e, <5) based on the previously derived lower and upper 


bounds on the conditional mutual information I(Z; Z v \Ws). 
Define the shorthand notation 

'( s '^= (1 - f ) k6 Em3- w< )' (18) 

which is the lower bound on I(Z; Z V \W$) in Lemma 1 . 

1) Cutset-capacity bounds: Combined with the conditional 
small ball probability lower bound in Lemma 1, the cutset- 
capacity upper bound in Lemma 2 leads to a lower bound on 

T(e,S): 


Theorem 1. For an arbitrary network, for any £ > 0 and 

5 £[ 0 , 1 / 2 ], 


Tie, S) > max 
“ Scv 


t(S,e,5) 

Cs 


From an operational point of view, the lower bound of 
Theorem 1 reflects the fact that the problem of distributed 
function computation is, in a certain sense, a joint source- 
channel coding (JSCC) problem with possibly noisy feedback. 
In particular, the lower bound on HZ: Z v \Ws) from Lemma 1, 
which is used to prove Theorem 1, can be interpreted in 
terms of a reduction of JSCC to generalized list decoding [32, 
Sec. III.B]. Given any algorithm A and any node v G V, we 
may construct a “list decoder” as follows: given the estimate 
Z v , we generate a “list” {z € Z : d(z,Z v ) < £}. If we fix 
a set S C V and allow all the nodes in S to share their 
observations Ws, then E[L(W(s,£)] is an upper bound on 
the ll'vi'-measure of the list of any node v £ S. Therefore, 
t(S, £, <5) is a lower bound on the total amount of information 
that is necessary for the JSCC problem. The complementary 
cutset upper bound on I (Z: Z v \Ws ) bounds the amount of 
information that can be accumulated with each channel use. 
The lower bound on T(e, 8) can thus be interpreted as a lower 
bound on the blocklength of the JSCC problem. 

As we will demonstrate in Section IV, based on Theorem 1, 
it is possible to exploit structural properties of the function / 
(such as linearity) and of the probability law Pw (such as log- 
concavity) to derive lower bounds on the computation time that 
are often tighter than existing bounds. 

2 ) SDPI bounds: Combining the lower bound of Lemma 1 
with the SDPI upper bound of Lemma 5, we get the following: 


Theorem 2. For an arbitrary network, for any e > 0 and 

5 £[ 0 , 1 / 2 ], 


T(e,S) > maxmax 


log (1 - 


t(S,E,S) 

H(W s c\W s )< 


v -1 


scv ves \£ v \ log(l — ??*) -1 


(19) 


where rj* = max e6£tj r){K e ). 


The lower bounds in Theorem 1 and Theorem 2 can behave 
quite differently. To illustrate this, we compare them in two 
cases: 

When HiWsfWs) » log e[l(1 ^ S;E)] , Theorem 2 gives 
log (l- ns ’ e ’ 5) 


T(e,6) > maxmax 


H(W S c\W s )> 


scv ves \8 V \ log(l tyjj ) -1 

£{S,e,8) loge 


max max 


scv vcs iL(Wsc|Ws)|£„|log(l - t ?*) -1 ' 
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which has essentially the same dependence on £(S, e, 6) as 
the lower bound given by Theorem 1. In this case. Theo¬ 
rem 1 gives more useful lower bounds as long as C$ -C 
H{W S o\W s ), especially when W is continuous. 

When H{W S c\W s ) « lo g v.[l(w s ,s)) and 5 is sma11 ’ 
H(Wsc\Ws) serves as a sharp proxy of £{S,£,S). Theorem 1 
in this case gives 


Tie , 6) > max 
“ Scv 


e(S,e,S) 


C s 


while Theorem 2 gives 


max 

Scv 


H(W S c\W s ) 

C s 


Tie, 5) > max max 
Scv vies 


Wfl_ ns,e.s) yl 

V 1 H(W S a\Ws) ) 

Wv\ l°g(l — Vv ) 1 


« max max 
Scv S 


log HlW S o\W s ) + log 

l^|iog(i- vi )- 1 


where in the last step we have used the fact that 

lQ g (<* + H(W S % S )) ~ lQ g aS 5 ->■ °‘ The - 

orem 1 in this case is sharper in capturing the dependence of 
T(e,8) on the amount of information contained in Z, in that 
the lower bound is proportional to H{Ws c \Ws), whereas the 
lower bound given by Theorem 2 depends on H{Ws^\Ws) 
only through log II (Ws^ | Ws ). On the other hand. Theorem 2 
in this case is much sharper in capturing the dependence of 
T{e,5) on the confidence parameter S, since \ogh 2 (5) grows 
without bound as 6 —► 0 , while the lower bound given by 
Theorem 1 remains bounded. We consider two examples for 
this case. 

The first is Example 1 in Section II-D, for the two-node 
mod-2 sum problem. We have L(vj 2 , e) = max 26 { 0 ,i} P[Wi© 
W 2 = z\W 2 = w 2 ] = and £{S,0,S) = 1 — <5 — h 2 {5). 
Theorems 1 and 2 imply the following: 


Corollary 1. For the problem in Example 1, for S € [0,1/2], 
the (0, S)-computation time satisfies 


T(0,5) > max |—— 


1 -6-h 2 {6) log {6 + h 2 {5)) 


)) _1 1 

r }. ( 20 ) 


-h 2 {p) 1 log(4 pp)~ 

where the first lower bound is given by Theorem 1, and the 
second one is given by Theorem 2. 


To obtain an achievable upper bound on T(0, 8) in Example 1, 
we consider the algorithm where each node uses a length-T 
repetition code to send its one-bit observation to the other 
node. Using the Chernoff bound, as in [33], it can be shown 
that the probability of decoding error at each node is upper- 
bounded by (4pp) T / 2 , and therefore this algorithm achieves 
accuracy e = 0 with confidence parameter S < (4 pp) T / 2 . 
This gives the upper bound 


T(0,6)< 


2 log 6 1 
log (4PP)- 1 ' 


( 21 ) 


Comparing (21) with the second lower bound in (20), we see 
that they asymptotically differ only by a factor of 2 as 6 -A 0, 
as lim^o log(<5 + h 2 {5))/ \og{6) = 1. Thus, for the problem 
in Example 1, the converse lower bound on T(0, S) obtained 
from the SDPI closely matches the achievable upper bound on 

T(0,6). 


The second example concerns the problem of disseminating 
all of the observations through an arbitrary network: 

Example 2. Consider the problem where W v ’s are i.i.d. 
samples from the uniformly distribution over { 1 ,...,M}, 
Z = W, and d(z, z) = 1 {z S'}. In other words, the goal of 
the nodes is to distribute their observations to all other nodes. 


In this example, H{Ws°\Ws) = |5 c |logM, and £(S,0,S) = 
(1 — (5)|<S c |logM — h 2 {S). Following Ayaso et al. [1, 
Def. III.4], we define the conductance of the network G as 


4>(G) 


min 

«SgV:| V|/2<|»S|<|V| 


is c r 


Then we have the following corollary: 


Corollary 2. For the problem in Example 2, Theorem 1 gives 


T{ 0, S) > max 
~ Scv 


(1 — 5) |<S C | log M — h 2 {5) 


> 


log M 


C s 

as 6 —> 0 , 


$(G) 

whereas Theorem 2 gives 

log (|S C | log M) + log/ i 2 (<5 ) -1 


T(0, 5) > maxmax 
scv ves 

• S- s> 0. 


\&v\ log(l — 


( 22 ) 

(23) 


(24) 


Again, we see that the lower bound obtained from SDPI is 
much sharper for capturing the dependence of T(0, 5) on 5, 
since log/i 2 (c >) -1 -A +oo as 6 —> 0. On the other hand, the 
lower bound obtained from the cutset capacity upper bound 
is tighter in its dependence on M, and can also capture the 
dependence on the conductance of the network. 

Finally, we point out that Theorem 1 gives the correct 
lower bound T(e, 6) = +oo when the network graph G is 
disconnected (assuming / depends on the observations of all 
nodes): If V consists of two disconnected components S and 
S c , then Cs = 0, which results in T(s, S ) = +oo. Despite the 
sharp dependence of the lower bounds of Theorems 1 and 2 on 
e and 6, they have the same limitation as all previously known 
bounds obtained via single-cutset arguments: they examine 
only the flow of information across a cutset 8s, but not within 
S', hence they cannot capture the dependence of computation 
time on the diameter of the network. We address this limitation 
in the following section. 


III. Multi-cutset analysis 

We now extend the techniques of Section II to a multi¬ 
cutset analysis, to address the limitation of the results obtained 
from the single-cutset analysis. In particular, the new results 
are able to quantify the dissipation of information as it flows 
across a succession of cutsets in the network. As briefly 
sketched in Sec. I-B, we accomplish this by partitioning a 
general network using multiple disjoint cutsets, such that the 
operation of any algorithm on the network can be simulated 
by another algorithm running on a chain of bidirectional noisy 
links. We then derive tight mutual information upper bounds 
for such chains, which in turn can be used to lower-bound the 
computation time for the original network. 
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A. Network reduction 

Consider an arbitrary network G = (V,£). If there exists 
a collection of nested subsets V\ C ... C V n -\ of V, such 
that the associated cutsets £-p 1 ,■■■■, £v n --i are disjoint, and the 
cutsets &p - { ..... &p- _ are also disjoint, then we say that G 
is successively partitioned according to ,V n -i into n 

subsets Si,, S n , where Si = Vi\ Vi- 1 , with Vo — 0 and 
V n = V. For i € {2,..., n}, a node in Si is called a left- 
bound node of Si if there is an edge from it to a node in 
5,_ 4 . The set of left-bound nodes of 5, is denoted by OS,. 
For 5i, dehne 55, = {v} for an arbitrary v € 5i. In addition, 
for i £ {2,..., n}, let 

di 4 |5 P »_ i | + \£ Vi | + \{£ n (Si x dSi)}\ (25) 

be the number of edges entering 5, from its neighbors 5, _ , 
and 5j+i, plus the number of edges entering OS, from 5, 
itself. For example. Fig. 2a in Sec. I-B illustrates a successive 
partition of a six-node network into three subsets 5i = {1,4}, 
52 = {2,5} and 53 = {3,6}, with dSi = {4}, 55 2 = {2} 
and 553 = {3,6}. In addition, dp - 5 and dp, = 4. As another 
example, the network in Fig. 4a, where each undirected edge 
represents a pair of channels with opposite directions, can be 
successively partitioned into 5i = {1}, S 2 = {2,7}, S 3 = 
{3,6,8,9}, 5 4 = {4,10}, and S 5 = {5}, with 55, = {1}, 
55i = {2, 7}, 55 3 = {3,8}, 55 4 = {4,10}, and 55, = {5}. 
In addition, dp = 6, d 3 = 7, d 4 = 6, and d§ = 2. 


3 




1' 2' 3' 4' 5' 

(b) 


Fig. 5: Another successive partition (using the construction in 
the proof of Lemma 6) and the chain reduced according to it. 


Lemma 6. Any network G = (V ,£) with bidirectional links 
(i.e., (u,v) £ £ if and only if (v, u) £ £) admits a successive 
partition into subsets 5 4 ,.. ., S n with n = dianr(G) + 1. 

Proof: For any v £ V and any r £ {0 : diam(G)}, we 
define the sets 

Bc(u,r) = {u £ V : d G (v,u) < r} 

and 

§G(f, r) = {u £ V : d G (v, u) = r} , 


3 



1' 2' 3' 4' 5' 

(b) 

Fig. 4: A successive partition of a network and the chain 
reduced according to it. 

Formally, a network G has bidirectional links if, for any pair 
of nodes u, v £ V, (u, v) £ £ if and only if (v, u) £ £. A path 
between u and v is a sequence of edges {(iij, Vj+i)}i=i, such 
that vi = u and Vk = v (if G is connected, there is at least one 
path between any pair of nodes). The graph distance between 
u and v, denoted by do(u,v), is the length of a shortest path 
between u and v (shortest paths are not necessarily unique). 
The diameter of G is then defined by 

diam(G) = maxmaxcicfu,?)). 
uGV vGV 

The following lemma states that any such network G can be 
successively partitioned into n = diam(G) + 1 subsets: 


i.e., the ball and the sphere of radius r centered at v. In 
particular, Bg(u, r) = Bg(w, r — 1) U Sg(^, tr). 

We now construct the desired successive partition. Let n = 
diam(G) + 1, and pick any pair of nodes vq,vi £ V that 
achieve the maximum in the definition of diam(G). With this, 
we take 

V z =M G (v 0 ,i - 1), i = 1,... ,n. 

Clearly, Vi = {^o} C V 2 C ... C V n = V, and moreover 
5j = § G Oo,* - 1 ), i=l,...,n. 

From this construction, we see that 

£vi = {(it, v) £ £ : u £ 5j+i, v £ Si} 

and 

£-pc = {(u, v) £ V : u £ Si, v £ 5 J+ i} . 

The pairwise disjointness of the cutsets &p t , as well as of the 
cutsets £p' , is immediate. ■ 

Remarks: 

• Using the construction underlying the proof, we can also 
show that, for any two nodes in G, we can successively 
partition G into n = d G (u, v) + 1 subsets. 

• For the successive partition constructed in the proof, all 
nodes in 5,; are left-bound nodes, and di is the sum of 
the in-degrees of the nodes in 5, . 

As an example. Fig. 5a shows the successive partition of the 
network in Fig. 4a using the construction in the proof, where 
5i = {!}, 5 2 = {2,7}, S 3 = {3,8}, 5 4 = {4,6,9}, 5 5 = 
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{5,10}, with dSi = Si, i £ {1,..., 5}, and d 2 = 6, c ?3 = 6, 
d 4 = 9, and d 5 = 5. 

The successive partition of G ensures that nodes in Si only 
communicate with nodes in iSj_i and <Si+i, as well as among 
themselves. Indeed, suppose that the network graph G includes 
an edge e = (u,v) £ £ with u £ Si and v £ Sj , where 
i > j + 1. By construction of the successive partition, u £ 
Vj + i C Vj and v £ Vj C Vj+i- Therefore, e belongs to 
both &p. and £p. +1 . However, the cutsets &p. and £p. +1 are 
disjoint, so we arrive at a contradiction. Likewise, we can use 
the disjointness of the cutsets &p? and £-p- to show that the 
network graph contains no edges (u,v) with u £ Si, v £ Sj, 
and j >i + l. 

In view of this, we can associate to the partition {<S,} a 
bidirected chain G' = (+',£'), i.e., a network with vertex set 
V = , n'}, edge set 


£' = {(*',(* ~ {(*'>(* + 1 ) , )}r=i 1 U 


and channel transition laws 


K\v,(i-iy) 

0 



(26) 


(u,v)££:u£Si 

,v£Si -1 



K(v,( i+ iy) 

0 



(27) 


(u,v)££:u£Si 

,veSi+i 



K(v,v) 

= 0 


^■(u,v) 1 

(28) 


(u,v)££:u£Si 

,v£ dSi 




where node i' in G' observes 


and independent across i' £ V' and t £ {1,... , T}. At step 
T, node i' computes the final estimate Zp = tpi'fWv ,Yj) 
of Z. These randomized algorithms have the feature that the 
message sent to the node on the left and the final estimate 
of a node are computed solely based on the node’s initial 
observation and received messages, whereas the messages sent 
to the node on the right and to itself are computed based on 
the node’s initial observation, received messages, as well as 
private messages, and the computation of the private messages 
involves the node’s private randomness. Define 

T'(e, S) = inf |t £ N : 3 a randomized T-step algorithm A' 
such that maxP \d(Z, Zp) > e\< 5 } (29) 

as the (e, <5)-computation time for Z on G' using the ran¬ 
domized algorithms described above. The following lemma 
indicates that we can obtain lower bounds on T(e, S) by lower- 
bounding T'(e,S). 

Lemma 7. Consider an arbitrary network G that can be 
successively partitioned into Si,... ,S n , such that dSi’s are 
all nonempty. Let G' = ( V,£') be the bidirected chain 
constructed from G according to the partition. Then, given any 
T-step algorithm on G that achieves ma it v ^yf > [d{Z, Z v ) > 
e] < <5, we can construct a randomized T-step algorithm A' on 
G', such that maxj/ e v' W[d(Z, Zp) > e] < <5. Consequently, 
T(e,6) for computing Z on G is lower bounded by T'(e,S) 
defined in (29). 


Wi, = W Si . 

In other words, the subset S, in G is reduced to node i' 
in G"; the channels across the subsets in G are reduced 
to the channels between the nodes in G'; and the channels 
from Si to dSi in G are reduced to a self-loop at node 
i' in G'. The channels from S, to S, \ dSi in G are not 
included in G', and will be simulated by node i using 
private randomness. For the network in Fig. 2a in Sec. I-B, 
according to the illustrated partition, it can be reduced to a 
3-node bidirected chain in Fig. 2b, with K^ V iy = Kyi^, 
I<( 2 >, 2 ') = if( 5 , 2 ), and K( 3 ', 3 ') = K ( 3 , 6 ) ® K( 6 , 3 )- For the 
network in Fig. 4a, according to the illustrated partition, it 
can be reduced to a 5-node bidirected chain in Fig. 4b, with 
FF(2',2') = K(2,7)®K(7,2)> K{3',3') = ^”(6,3)®-^-(6,8)®-^"(9,8), 
and = A} 4 10 ) <8> K^ w i y According to the partition 

in Fig. 5a, the same network can be reduced to a 5-node 
bidirected chain in Fig. 5b, with K^', 2 1 ) = K( 2 , 7 ) ® K{ 7 , 2 ), 
F^(4',4') = Ff(6,9) ® and = K ( 5)10 ) <S> K(w,S). 

For the bidirected chain G' reduced from G, we consider a 
class of randomized T-step algorithms that run on G' and 
are of a more general form compared to the deterministic 
algorithms considered so far. Such a randomized algorithm 
operates as follows: at step t £ {1,...,T}, node i' com¬ 
putes the outgoing messages -iyy t = Tv t( w i£ Y v~ 1 )’ 
_= and = 

Vi>,t{W v ,Y* 1 , Uf 1 ), and computes the private message 
Uv,t = dv,t(WvXr\Ulr\Rv,t), Where Rjy t is the private 
randomness held by node i', uniformly distributed on [0,1] 


Proof: Appendix A. ■ 

Remark: In the network reduction, we can alternatively map 
all the channels from Si to S, (instead of only mapping the 
channels from Si to OS,) in the original network G to the 
self-loop at node i' of the reduced chain G'. By doing so, 
to simulate the operation of an algorithm A that runs on G, 
the algorithm A! that runs on G' no longer needs to generate 
private messages using the nodes’ private randomness, since 
all the channels in G are preserved in G'. In other words, 
under this alternative reduction, any T-step algorithm A that 
runs on G can be simulated by a T-step algorithm A! of the 
same deterministic type as A that runs on G'. However, this 
alternative reduction increases the information transmission 
capability of the self-loops in G', and will result in a looser 
lower bound on T(e,5), as will be discussed in the remark 
following Theorem 3. 

In light of Lemma 7, in order to lower-bound T(e, 6) for 
computing Z on G, we just need to lower-bound T'(e,5) 
defined in (29). To this end, we derive upper bounds on 
the conditional mutual information for bidirected chains by 
extending the techniques behind Lemma 2 and Lemma 5: 

Lemma 8. Consider an n-node bidirected chain with vertex 
set V = {1,..., n} and edge set 

£ = {(M- DKU u { < A’ i + l ))l,Zl u {(M)KLi« 

and an arbitrary randomized T-step algorithm A' that runs 
on this chain. Let rji = p(Ki) denote the SDPI constant of the 
channel Ki = (ff) r y e£ K(j,i), and let rj = maxj = i ; ... i „ ty*. 
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If T < n — 2, then 

I(Z-Z n \W 2 , n )=0. 


If T > n — 1, f/zen 

/(Z;Z n |W 2:ri ) < 

' T-n+2 

B(T — i,n— 2,rf), n> 2 (30a) 

< 1=1 
T-n+2 

C(l, 2 )?? E B(T — i — 1, n — 3, rj)i, n> 3 (30b) 

. i=l 


with B(m,k,p) = (™)p fc ( 1 — p) m k ■ For n > 2, f/ze above 
upper bounds can be weakened to 

I{Z-Z n \W 2 :n) < 


[H(Wi\W 2:n )(\ — (1 — p) T ~ n+2 ) n \ (31a) 

\c ( i, 2 )(T -n + 2)(1 - (1 - fj) T ~ n+2 ) n ~ 2 . (31b) 

Moreover, if n > 4 azzt/ 

i ^ t- ^ o ( n “ 3 b 
n — 1 < 1 < 2 H- 

V 

for some 7 £ (0,1), f/zezz 

J(Z;Z n |W 2:n ) < 

(n — 3) 2 7 2 


C, 


( 1 , 2 )- 


V 


ex P I “ ~ v ) ( n - 3) ) ■ (32) 


Proof: Appendix B. ■ 

Equation (30a) is reminiscent of a result of Rajagopalan 
and Schulman [13] on the evolution of mutual information in 
broadcasting a bit over a unidirectional chain of BSCs. The 
result in [13] is obtained by solving a system of recursive 
inequalities on the mutual information involving suboptimal 
SDPI constants. Our results apply to chains of general bidi¬ 
rectional links and to the computation of general functions. We 
arrive at a system of inequalities similar to the one in [13], 
which can be solved in a similar manner and gives (30a) and 
(30b). We also obtain weakened upper bounds in (31a) and 
(31b), which show that, for a fixed T, the conditional mutual 
information decays at least exponentially fast in n. The upper 
bound in (32) provides another weakening of (30a) and (30b), 
and shows explicitly the dependence of the upper bound on 
n. 

Assuming for simplicity that H(Wi\W 2:n ) = 1, Fig. 6 
compares (30a) with the weakened upper bound in (31a). 
We can see that the gap can be large when n is large and 
T is much larger than n. Nevertheless, the weakened upper 
bounds in (31a) and (31b) allow us to derive lower bounds on 
computation time that are non-asymptotic in n, and explicit in 
£, 6, and channel properties. 


B. Lower bounds on computation time 

We now build on the results presented above to obtain lower 
bounds on the T(e,8) by reducing the original problem to 
function computation over bidirected chains. We first provide 
the result for an arbitrary network, and then particularize it to 
several specific topologies (namely, chains, rings, grids, and 
trees). 



Fig. 6 : Upper bound in (30a) (solid line) vs. the weakened one 
in (31a) (dashed line) for chains. 


1) Lower bound for an arbitrary network: Theorem 3 be¬ 
low contains general lower bounds on computation time for an 
arbitrary network. The statement of the theorem is somewhat 
lengthy, but can be parsed as follows: Given an arbitrary 
connected network with bidirectional links, any reduction of 
that network to a bidirected chain gives rise to a system of 
inequalities that must be satisfied by the computation time 
T(e,5). These inequalities, presented in (33), are nonasymp- 
totic in nature and involve explicitly computable parameters 
of the network, but cannot be solved in closed form. The first 
inequality follows from an SDPI-based analysis analogous to 
Theorem 2, while the second inequality is a cutset bound in 
the spirit of Theorem 1 . Explicit but weaker expressions that 
lower-bound T(e, 6) in terms of network parameters appear 
below as (34) and (35), together with asymptotic expressions 
for large n (the size of the reduced bidirected chain). Both of 
these bounds state that T(e,S) is lower-bounded by the size 
of the bidirected chain plus a correction term that accounts 
for the effect of channel noise (via channel capacities and 
SDPI constants). Finally, (36) and (37) provide the precise 
version of the bound in ( 8 ): asymptotically, the computation 
time T(e, 5) scales as f l(n/fj), where r) is the worst-case SDPI 
constant of the reduced network. By Lemma 6 , it is always 
possible to reduce the network to a bidirected chain of length 
diam(G) + 1, so the main message of Theorem 3 is that the 
computation time T(e, S) scales at least linearly in the network 
diameter. Thus, the main advantage of the multi-cutset analysis 
over the usual single-cutset analysis is that it can capture this 
dependence on the network diameter. 

Theorem 3. Assume the following: 

• The network graph G = (V,£) is connected, the capac¬ 
ities of all edge links are upper-bounded by C, and the 
SDPI constants of edge links are upper-bounded by 77 . 

• G admits a successive partition into Si,, S n , such that 
dSi’s are all nonempty. 
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Let 


where 


A = max di 

i£{2:n} 


d z = \&p Ux I + I E Vi I + \{£ n (Si X 95011 
as defined in (25), and let 

fj = l-(l-r]) A . 

Then for e > 0 and S £ (0,1/2], the (, e , S)-computation time 
T(e,S) must satisfy the inequalities 

t(Sf,e,S)< 

T(e,8) — n+2 

IHWsAWs^f, Y B(T(e,6)-i,n-2,fj), n> 2 

2 =1 

T{e,8)—n-\- 2 

C+r) E B(T(e, (5) — i — 1, n — 3, fj)i, n > 3. 

i=l 

(33) 

77ze above results can be weakened to 

los(l-( t(s °’ e ’ s) )^ I Y 1 

\H(W S , \W s o)> ) 

T(e, 5) > - V A1 .E -— + ^ - 2 (34) 


Alog(l — if) 


log(n - 1 ) + log (1 - H ^; e \w si ) ) 


Alog(l - rf) 


-1 


as n -+ oo, and 


rrt- n ^ £(Sf,e,5) 

T(e,d) > -—- hn-2. 

Csi 


C\V\ 2 (n — 3 ) 2 


4?; 


then 


rM)>2 + ^> 2 "- 3 


2t) 


2 A 77 


< 1 ^ (1 - 77) a , 


n> 2 


n > 3, 


(39) 


+ n ^ 2, T'(e,6)> 


we have 

^({2' : n'},£,<5) < 

T—n+2 

J+W+WW)?) E B(T-i,n-2,H), 

i= 1 

T—n+2 

C (1 >, 2 ’)V Y B(T-i-l,n-3,f])i, 

2=1 

and for n> 2, 

l({2' : ti'},£,<5) < 

r n 

# (Wt' |W 2 ':n') f[ ( X “ ( X ~ »j)* (T - n+2) ) 

2=2 

n 

C ( i', 2 ')+ - n + 2) JJ (1 - (1 - ^) d ‘ (T -” +2) ). 

i=3 

Since 7({2' : n'},£,<5) = ^(5f,e,(J), ff(Wi/1WW) = 
7f(W5i|W50' an£ f C/i', 2 ') = C+, we see that T'(e,8) must 
satisfy (33) in Theorem 3. 

Using (38), (39) can be weakened to 

++M)< 

f H{W Sl |Ws»)(1 - (1 - 77 ) a ( t_ ”+2))" _1 

\c S o(T-n + 2)(l- (l-? 7 ) A ( T -"+ 2 )) n 

The first line of (40) leads to 

wA ( )^Y 1 

log V { H (w Sl \w s c)) ) 


(40) 


A log(l - if) 


-l 


+ n — 2 


log(n - 1 ) + log (1 - H (ws’\w si ) ) 


(35) 


Alog(l — 77 ) 


-1 


Moreover, if the partition size n is large enough, so that n > 4 
and 


exp (—2+(n — 3)) < £(Sf, e, S), (36) 


(37) 


T n — 2, 

where the last step follows from the fact that log (l —pi ) 1 ~ 
log y+ as n —f 00 for p £ (0,1). The second line of (40) 
leads to 

rr'^ ss ^ ++ C >M) , _ 0 

T (e,d) > -—- Vn- 2. 

Us| 

Finally, we prove that T'(e, 8) = fl(n/fj) under the assump¬ 
tion that (36) holds. Suppose that T'(e, 5) < 2 + (n — 3)/2fj. 
Then, from (32) in Lemma 8 , we have 

(n — 3 ) 2 


Proof: In light of Lemma 7, it suffices to show that the 
lower bounds in Theorem 3 need to be satisfied by T'(e,8) 
for the bidirected chain G', to which G reduces according to 
the partition {5+ 

Consider any randomized 7-step algorithm A' that achieves 
maxj/gv' P[d(Z, Zf) > e] < 8 on G'. From Lemma 1, 

I(Z-Z n ,\Wv :n ,)>t({2' 

Then from Lemma 8 and the fact that 

Vi 1 = v( K {{i-\y,i') ® K((i+iy,i’) ® Kv,v) 


t(SZ,e,8)<C S o- 


4 i) 


exp (—2rj 2 (n — 3)) , if n > 4. 


Note that A > 1 by the assumption that G is connected, thus 
77 = 1 — (1 — 77 + > 77 . Moreover, C+ < C\£\ < C*|V| 2 . As 
a result, 

C*| V| 2 ( 77 . — 3) 2 


Z(St,e,8) < 


477 


■ exp (— 2 ? 7 2 (n — 3)), if n > 4, 


(38) 


which contradicts the assumption that (36) holds. Thus, 

T'(M)>2 + +!>2 + +5 

Theorem 3 then follows from Lemma 7. ■ 

Remarks: 

• We call a node in S, a boundary node if there is an edge 
(either inward or outward) between it and a node in 5,_i 
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or iSj+i. Denote the set of boundary nodes of <S, by OS,. 
The results in Theorem 3 can be weakened by replacing 
di with 

ddi = ^2 \ £ v\, 
vGdSi 

namely the summation of the in-degrees of boundary 
nodes of Si, since di < ddi for i £ { 2 ,..., nj. 

• As discussed in the remark following Lemma 7, an 
alternative network reduction is to map all the channels 
from Si to Si (instead of only mapping the channels from 
Si to dSi) in the original network G to the self-loop at 
node i' of the reduced chain G'. Using the same proof 
strategy with this alternative reduction, we can obtain 
lower bounds on T(e, S) of the same form as the results 
in Theorem 3, but with dfs replaced by 

di 4 | S vt i I + \£ Vi \ + |{£n ($< X 5011. 

Since di < ddi < di for i G {2,..., n}, the lower bounds 
on T(e,S) obtained by this alternative network reduction 
are weaker than the results in Theorem 3, and are even 
weaker than the results obtained by replacing di’s with 
ddiS. 

• Due to Lemma 6 , for a network G of bidirectional links, 
we can always find a successive partition of G such that n 
in Theorem 3 is equal to the diam(G)+l. By contrast, the 
diameter cannot be captured in general by the theorems 
in Section II. 

• Choosing a successive partition of G with n = 2 is 
equivalent to choosing a single cutset. In that case, we 
see that (35) recovers Theorem 1, while (34) recovers a 
weakened version of Theorem 2 (in (34), A = d 2 is at 
least the sum of the in-degrees of the left-bound nodes of 
6>2, while Theorem 2 involves the in-degree of only one 
node in 6>2 )■ 

We now apply Theorem 3 to networks with specific topologies. 
We assume that nodes communicate via bidirectional links. 
Thus, any such network will be represented by an undirected 
graph, where each undirected edge represents a pair of chan¬ 
nels with opposite directions. 

2) Chains: For chains, the proof of Theorem 3 already 
contains lower bounds on T'(e,5). These lower bounds apply 
to T(e,S) as well, since the class of T-step algorithms on a 
chain is a subcollection of randomized T-step algorithms on 
the same chain. We thus have the following corollary. 


Corollary 3. Consider an n-node bidirected chain without 
self-loops, where the SDPI constants of all channels are upper 
bounded by rj. Then for e > 0 and 6 £ (0,1/2], T(e,<5) 
must satisfy the inequalities in Theorem 3 with Si = {1} and 
di = 2 for all i £ { 1,..., n}. In particular, if all channels are 
BSC(p), then 


T(e,S) > max 


e(v\{i},e,s) 

1 - h 2 (p) 


log(n - 1) + log (l 


*(v\{i}, e ,<5) ,-i 
/r(W7|Wv\ {1 >V 


21og(4pp) -1 


+ n — 2 


for all sufficiently large n. 

Here and below, the estimates for a network of bidirectional 
BSCs are obtained using the bounds (16) and (17). 

3) Rings: Consider a ring with 2n - 2 nodes, where the 
nodes are labeled clockwise from 1 to 2n — 2. The diameter 
is equal to n— 1. According to the successive partition in the 
proof of Lemma 6, this ring can be partitioned into S\ = {1}, 
Si = {*, 2n — i}, i £ {2,..., n — 1}, and S n = {n}. As 
an example. Fig. 7a shows a 6-node ring and Fig. 7b shows 
the chain reduced from it. With this partition, we can apply 



1' 2' 3' 4' 


(b) 


Fig. 7: A ring network and the chain reduced from it. 


Theorem 3 and get the following corollary. 


Corollary 4. Consider a (2n 2)-node ring, where the SDPI 
constants of all channels are upper bounded by p. Then for 
e > 0 and S £ (0,1/2], T(e, i5) must satisfy the inequalities in 
Theorem 3 with <Si = {1} and di = 4 for all i £ {1,... ,n}. 
In particular, if all channels are BSC(p), then 


T(e, 6) = max 


2(1 - h 2 (p)) 


log (n - 1) + log (1 - gffiiwvtt?}) ) 


-1 


4 log (4 pp) 


-l 


+ n — 2 


for all sufficiently large n. 


4) Grids: Consider an T/i x n±l gj-id (where we assume n 
is odd), which has diameter n— 1. Figure 8a shows a successive 
partition of a "+ 1 x grid i nto "J 1 subsets, with A = 
max ie { 2 :ra} di = 2n. Figure 8b shows the successive partition 
in the proof of Lemma 6, which partitions the network into n 
subsets, with A = max ie { 2 :n} di = 2(n— 1), thus resulting in 
strictly tighter lower bounds on computation time compared 
to the ones obtained from the partition in Fig. 8a. With the 
latter partition, we get the following corollary. 



5 

6 
7 



Fig. 8: Successive partitions of a 4 x 4 (n = 7) grid network. 
The length of the labeled path is the diameter of the network. 
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Corollary 5. Consider an x grid, where 1 — ... — n 
is one of the longest paths. Assume that the SDPI constants 
of all channels are upper bounded by rj. Then for e > 0 and 
5 G (0,1/2], T(e, S) must satisfy the inequalities in Theorem 3 
with Si = {1}, dj = d n+ i-i = 4(z —2) +6, z G {1,..., 
and d( ra +i )/2 = 2 (n — 1). In particular, if all channels are 
BSC(p), then 


T(e,5) > max 


e(V\{l},e,S) 
2(1 -h 2 {p)) 


log(n - 1) + log (l 


i(v\{i}, e ,g) \-i 
H(W!\W V \ {1} )) 


2 (n — 1) log(4 pp)- 1 


+ n — 2 


for all sufficiently large n. 

5) Trees: Consider a tree, whose nodes are numbered in 
such a way that 1 —... —n is one of the longest paths. Then the 
diameter of the tree is n — 1, and nodes 1 and n are necessarily 
leaf nodes. The tree can be viewed as being rooted at node 
1. Let T>i be the union of node i and its descendants in the 
rooted tree, and let Si = T>i \ T>i + 1 , i G {1,..., n}. The tree 
can then be successively partitioned into Si,.. ., S n . In the n- 
node bidirected chain reduced according to this partition, the 
edges between nodes i' and (z + 1)' are the pair of channels 
between nodes i and z + 1 in the tree, and the self-loop of node 
i', i G {2,..., n — 1}, is the channel from S, \ {*} to node i in 
the tree. As an example. Fig. 9a shows this partition of a tree 


3 



3 



Fig. 9: Successive partitions of a tree network. 


network, and the chain reduced from it has the same form as 
the one in Fig. 4b. With this partition, we get the following 
corollary. 

Corollary 6. Consider a d-regular tree network where 1 — 
... — n is one of the longest paths. Assume that the SDPI 
constants of all channels are upper bounded by q. Then for 
e > 0 and S G (0,1/2], T(e, S) must satisfy the inequalities in 
Theorem 3 with Si = {1} and di = d for all i G {1,... , n}. 


In particular, if all channels are BSC(p), then 

TM)>max| , 

log(n - 1) + log (1 - 

dlog(4pp) -1 

for all sufficiently large n. 


+ n — 2 


If we use the successive partition in the proof of Lemma 6 
on a d-regular tree with diameter rz — 1, then the tree will 
be reduced to an n-node bidirected chain without self-loops. 
Figure 9b shows such an example. However, with this parti¬ 
tion, A = max ig { 2 :n} di increases with n, which renders the 
resulting lower bound on computation time looser than the 
one in Corollary 6. It means that, although the partition in the 
proof of Lemma 6 always captures the diameter of a network, 
it may not always give the best lower bound on computation 
time among all possible successive partitions. 


IV. Small ball probability estimates for 

COMPUTATION OF LINEAR FUNCTIONS 

The bounds stated in the preceding sections involve the 
conditional small ball probability, defined in (9). In this 
section, we provide estimates for this quantity in the context 
of a distributed computation problem of wide interest — 
the computation of linear functions. Specifically, we assume 
that the observations W v ,v G V, are independent real-valued 
random variables, and the objective is to compute a linear 
function 

Z = f(W) = Y / a v W v (41) 

vGV 

for a fixed vector of coefficients (a v ) v£ y G subject to 

the absolute error criterion d(z,z) = \z — z\. We will use the 
following shorthand notation: for any set S C V, let as = 
(■ a v )ves and {as, W s ) = J2ves a vW v . 

The independence of the W v ’s and the additive structure of 
/ allow us to express the conditional small ball probability 
L(ws,s) defined in (9) in terms of so-called Levy concentra¬ 
tion functions of random sums [19]. The Levy concentration 
function of a real-valued r.v. U (also known as the “small ball 
probability”) is defined as 


£(U, p) = sup P [|(7 — zt| < p ], p > 0. (42) 

If we fix a subset S C V, and consider a specific realization 
Ws = ws of the observations of the nodes in S, then 


L(ws,e) = supP 

y; a v w v - z 

< £ 

Ws = ws 

zGR 

_ vGV 




= sup P 

zeR 


^ a v W v + ^ a v w v — z 
s° ves 


< £ 


supP 

1 

£ 

Q 

i_ 

< £ 

zeR 

. d£5 c 



= £{{a S o, W S c ),£), 


(43) 
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where in the second line we have used the fact that the W v ’ s 
are independent r.v.’s, while in the third line we have used 
the fact that for any function g : R —> R. and any a G R, 
sup. g(z) = sup. g(z + a). In other words, for a fixed S, the 
quantity L(ws,s) is independent of the boundary condition 
ws , and is controlled by the probability law of the random 
sum (asc, Wgc}, i.e., the part of the function / that depends 
on the observations of the nodes in S c . 

The problem of estimating Levy concentration functions 
of sums of independent random variables has a long history 
in the theory of probability — for random variables with 
densities, some of the first results go back at least to Kol¬ 
mogorov [34], while for discrete random variables it is closely 
related to the so-called Littlewood-Offord problem [35], We 
provide a few examples to illustrate how one can exploit 
available estimates for Levy concentration functions under 
various regularity conditions to obtain tight lower bounds 
on the computation time for linear functions. The examples 
are illustrated through Theorem 1, as it tightly captures the 
dependence of computation time on 1{S, e, 4). (However, since 
the results of Theorems 2 and 3 also involve the quantity 
f(S,e,6), the estimates for Levy concentration functions can 
be applied there as well.) 


A. Computing linear functions of continuous observations 

1) Gaussian sums: Suppose that the local observations W v , 
v G V, are i.i.d. standard Gaussian random variables. Then, 
for any S C V, ( 05 , Ws) is a zero-mean Gaussian r.v. with 
variance ||as||! = eS a v (here, || • H 2 is the usual Euclidean 
C 2 norm). A simple calculation shows that 


L(w s ,s ) = C (N (0, ||a 5 c||2) , e ) < 


7T ||as<=||2 


Using this in Theorem 1, we get the following result. 

Corollary 7. For the problem of computing a linear function 
in (41), where (Wf) ' iV(0,1), suppose that the coefficients 
a v are all nonzero. Then for e > 0 and S G (0,1/2], 





log 


2 e 2 



Thus, the lower bound on the computation time for (41) 
depends on the vector of coefficients a only through its £2 
norm. 

2) Sums of independent r.v.’s with log-concave distributions: 
Another instance in which sharp bounds on the Levy con¬ 
centration function are available is when the observations of 
the nodes are independent random variables with log-concave 
distributions (we recall that a real-valued r.v. U is said to 
have a log-concave distribution if it has a density of the 
form pu{u) = e~ F ( u \ where F : R —> (— 00 , + 00 ] is a 
convex function; this includes Gaussian, Laplace, uniform, 
etc.). The following result was obtained recently by Bobkov 
and Chistyakov [36, Theorem 1.1]: Let Ui,...,Uk be inde¬ 
pendent random variables with log-concave distributions, and 


let Sk = U\ + ... + Uk■ Then, for any p > 0, 


J-_ P _ 

73 ^/Var (S k ) + p 2 /3 


< £(S k ,p) < 


2 P 

7Var(5 fc )+p 2 /3' 

(44) 


Corollary 8. For the problem of computing a linear function 
in (41), where the W v ’s are independent random variables 
with log-concave distributions and with variances at least o 2 , 
suppose that the coefficients a„ are all nonzero. Then for e > 0 
and S G (0,1/2], 


ne)S )> fs - 



V 4e 2 




Proof: For each v G V, a v W v also has a log-concave 
distribution, and, for any S C V, 


Var((a 5 c, VU 5 c)) = ^ |a„| 2 Var(lU u ) > \\a S o\\la 2 . 
ves° 


The lower bound follows from Theorem 1 and from (44). ■ 

3) Sums of independent r.v.’s with bounded third moments: 
It is known that random variables with log-concave distribu¬ 
tions have bounded moments of any order. Under a much 
weaker assumption that the local observations W v , v G V have 
bounded third moments, we can prove the following result. 


Corollary 9. Consider the problem of computing the lin¬ 
ear function in (41), where the W v ’s are independent zero- 
mean r.v.’s with variances at least 1 and with third moments 
bounded by B, and the coefficients a v satisfy the constraint 
I\ l < \a v \ < K 2 for some Ki,K 2 > 0. Then for e > 0 and 
5 G (0,1/2], 

Jfe S) > max L 1^ L/g _ 


where M(e) = c{e/Ki + B(K 2 /ICl) 3 ) with some absolute 
constant c. 


Proof: Under the conditions of the theorem, a small ball 
estimate due to Rudelson and Vershynin [37, Corollary 2.10] 
can be used to show that, for any S C V, 

C((a s ,Ws),e)<^. 

The desired conclusion follows immediately. ■ 


B. Linear vector-valued functions 

Similar to the Levy concentration function of a real-valued 
random variable, the Levy concentration function of a random 
vector U taking values in R” can be defined as 

£(U,p) = sup P[||U-u ||2 < p), p> 0. 

U(zM n 

Consider the case where each node observes an independent 
real-valued random variable W v , and the observations form 
a | V | x 1 vector Wy. Suppose the nodes wish to compute a 
linear transform of Wy, 

Z = AWy (45) 
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with some fixed n x |V| matrix A, subject to the Euclidean- 
norm distortion criterion d{z,'z) = \\z — z|| 2 . In this case 

L{w Sl £ ) = sup P[||AW V - z\\ 2 < e\W s = up 

zeR" 

= sup P[|| A S o + A s ws - z\U < £ ] 

zeR" 

= sup P[||A S cW 5 c - z\\ 2 < e] 

zeR™ 

= C(A S cW S c,e) 

where Asc is the submatrix formed by the columns of 
A with indices in S c . We will need the following result, 
due to Rudelson and Vershynin [38]. Let Sj^Agc), j = 
1 ,..., min{n, |<S C |}, denote the singular values of Ago ar¬ 
ranged in non-increasing order, and define the stable rank of 
A s <= by 


r{A S c) 


Prills 

PHI 2 


where ||A 5 c|| HS = (E^i {n,|S l} Sj(A S c) 2 ) 1/2 is the 
Hilbert-Schmidt norm of and |||| = pAgc) is the 
spectral norm of ,4 5 c. (Note that for any nonzero matrix A$c, 
1 < r(^ 5 c) < rank(A,sc).) Then, provided 


£(IP,£/PH|hs) < p 


for all v £ S c , we will have 

C(A s «W S e,s) < (, cp ) 09r ^ 


where c is an absolute constant [38, Theorem 1.4]. This 
result relates the Levy concentration function of the linear 
transform of a vector to the Levy concentration function 
of each coordinate of the vector. Applying this result in 
Theorem 1, we get a lower bound on T(e,S) for computing 
linear vector-valued functions. 


Corollary 10. For the problem of computing a linear trans¬ 
form of the observations defined in (45), where W v ’s are 
independent real-valued r.v.s, suppose the rows of A are 
nonzero vectors. Then for e > 0 and 5 £ (0,1/2], 

T(s, 5) > max ^0.9(1 - 6)r(A S c) 

log- 

cmax„ e5 c £(W„,£/Psc||h S ) 
for some absolute constant c. 



C. Linear function of discrete observations 

Linally, we consider a case when the local observations 
W v have discrete distributions. Specifically, let the W v ’s be 
i.i.d. Rademacher random variables, i.e., each W v takes values 
±1 with equal probability. We still use the absolute distortion 
function d{z,^z) = \z — z| to quantify the estimation error. In 
this case, the Levy concentration function C((as , Ws),e) will 
be highly sensitive to the direction of the vector as, rather than 
just its norm. Lor example, consider the extreme case when 
a v = |V| for a single node v £ S, and all other coefficients 


are zero. Then C((as, Wjs),0) = C(\V\W v ,Q) = 1/2. On the 
other hand, if a v = 1 for all v £ V and <S is even, then 

C((a s ,W s ), 0) = 2- W ( | ^ 2 )~^H -l^oo 

where the last step is due to Stirling’s approximation. More¬ 
over, a celebrated result due to Littlewood and Offord, im¬ 
proved later by Erdos [39], says that, if |a„| > 1 for all v, 
then 

£(<«, W S ), 1) < 2-'*' ( L| ^ 2J ) ~ /P - 1*1 - »• 

which translates into a lower bound on the ( 1 , 6 )-computation 
time which is of the same order as the lower bound on the 
zero-error computation time. 

Corollary 11. For the problem of computing the linear func¬ 
tion in (41), where the W v ’s are independent Rademacher 
random variables, suppose that |a„| > 1 for all v, and 
5 < 1/2. Then 

T(0, 5) > T(l, S) > max lo S “ P 5 )) 

as |<S| — i 00 . 


D. Comparison with existing results 

We illustrate the utility of the above bounds through com¬ 
parison with some existing results. Lor example, Ayaso et 
al. [ 1 ] derive lower bounds on a related quantity 

T(e, <5) = inf |t £ N : 3 a T-step algorithm A such that 

nmxP[Z„ ^ [(1 — e)Z, (1 + e)Z] ] < 5 j. 

One of their results is as follows: if Z = /(IT’) is a linear 
function of the form (41) and (W v ) 'P' Uniform([l, 1 + B ]) 
for some B > 0, then 

s) - ggg H ‘° S B? + rf / (1/B) 2 /|V| <«> 

for all sufficiently small e, S > 0 , where k > 0 is a fixed 
constant [1, Theorem III.5]. Let us compare (46) with what 
we can obtain using our techniques. It is not hard to show that 

T(e,6) >T(||a||i(l + S)e,(J) (47) 


where ||a||i = Y^vev PI * s i\ norm of a. Moreover, since 
any r.v. uniformly distributed on a bounded interval of the real 
line has a log-concave distribution, we can use Corollary 8 to 
lower-bound the right-hand side of (47). This gives 


TM)> max Cs 


1 (1-5 


log 


B 2 


\as< 


48(5 + l) 2 ||a||2 £ 2 



for all sufficiently small s, 6 > 0. We immediately see that 
this bound is tighter than the one in (46). In particular, the 
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right-hand side of (46) remains bounded for vanishingly small 
e and 8, and in the limit of e, 8 —> 0 tends to 

\S\ log B log B 
scvCs |V| niin^cv Cg ' 

By contrast, as e, 5 —* 0, the right-hand side of (48) grows 
without bound as log(l/e). 

Another lower bound on the (e, ^-computation time T(e, 8) 
was obtained by Como and Dahleh [2], Their starting point 
is the following continuum generalization of Fano’s inequality 
[2, Lemma 2] in terms of conditional differential entropy: if 
Z, Z are two jointly distributed real-valued r.v.’s, such that 
EZ 2 < oo, then, for any £ > 0, 

h(Z\Z) < P[\Z - Z\ < e] log£ + ^ log (l67reEZ 2 ). (49) 

If we use (49) instead of Lemma 1 to lower-bound 
I(Z\ Z v \Ws), then we get 

T( £ , S) > max ^ ^log ^ + h{Z\W s ) 

— ^ log (l67reEZ 2 ) J. (50) 


and the fact that the W v ’s are mutually independent, we can 
write 

los iii(tkiji alo 4 +lo «rau 

- log ^ + KS w ”) 

ves- 

= ^g 2ee + h(Z\Ws), 

Using this estimate in Theorem 1, we get the desired lower 
bound on T(e,8). ■ 

V. Comparison with upper bounds on computation 

TIME 

For the two-node mod-2 sum problem in Example 1, we 
have shown in Corollary 1 that the lower bound on computa¬ 
tion given by Theorem 2 can tightly match the upper bound. In 
this section, we provide two more examples in which our lower 
bounds on computation time are tight. In the first example, our 
lower bound precisely captures the dependence of computation 
time on the number of nodes in the network. In the second 
example, our lower bound tightly captures the dependence of 
computation time on the accuracy parameter s. 


Again, let us consider the case when Z = f(W ) is a linear 
function of the form (41) with all a v nonzero and with 


(W v ) i ~ d ‘ N( 0,1). Then (50) becomes 


T (c,5)> fSc , s 


1 fl-8 


log 


1 


1 , \Ws- 

2 l0g 8||a| 


(51) 


The lower bound of our Corollary 7 will be tighter than (51) 
for all £ > 0 as long as 


1-8 

2 


log 



h 2 (S) > - log 


8IHII ’ 


VS c V. 


Note that the quantity on the right-hand side is nonpositive. 
More generally, for observations with log-concave distribu¬ 
tions, the result of Lemma 1 can be weakened to get a lower 
bound involving the conditional differential entropy h{Z\Ws), 
which is tighter than similar results obtained in [2], 


A. Rademacher sum over a dumbbell network 

Example 3. Consider a dumbbell network of bidirectional 
BSCs with the same crossover probability. Formally, suppose 
|V| is even, and let the nodes be indexed from 1 to |V|. 
Nodes 1 to |V|/2/orw a clique (i.e., each pair of nodes are 
connected by a pair of BSCs), while nodes |V|/2+l to \V\form 
another clique. The two cliques are connected by a pair of 
BSCs between nodes |V|/2 and |V|/2 + l. Each node initially 
observes a Bern(j) (or Rademacher) r.v. The goal is for the 
nodes to compute the sum of the observations of all nodes. 
The distortion function is d(z,z ) = \z — z |. 

By choosing the cutset as the pair of BSCs that joins the 
two cliques, our lower bound for random Rademacher sums in 
Corollary 11 gives the following lower bound on computation 
time. 


Corollary 12. If the observations W v , v £ V, have log- 
concave distributions, then for computing the sum Z = 
£ u6V W'V subject to the absolute error criterion d(z,z) = 

| z — z\, for £ > 0 and 5 £ (0,1/2], 

T(e, 8) > max ^(1 - S) (h(Z\W s ) + log - h 2 (8)) . 

Proof: Let ps(z) denote the probability density of 
YjvdS- W v Then fr0m ( 43 )’ 

rz+e 

L{w s ,£) = sup / p 5 (z)dz < 2EHP5HOO (52) 

zElR JZ — E 


Corollary 13. For the problem of in Example 3, for 8 £ 

( 0 , 1 / 2 ), 

T (0,<5) > ^ ^ 1 -^ 1 °g^ -h 2 (S)j as |V| -A oo, 
which implies 

T(0,8) = Cl (log |V|). 

Now we show that the above lower bound matches the upper 
bound on the computation time, which turns out to be 

T(0,8) = O (log |V|) . 


for all ws £ l/Les where ||p 5 ||oo is the sup norm of pg. 
By a result of Bobkov and Madiman [40, Proposition 1.2], if U 
is a real-valued r.v. with a log-concave density p, then the dif¬ 
ferential entropy h(U) is upper-bounded by loge + log HpII^ 1 . 
Using this fact together with (52), the log-concavity of pg, 


As shown by Gallager [11], for a fixed success probability, 
nodes |V|/2 and |V|/2 + 1 can learn the partial sum of the 
observations in their respective cliques in 0( log log |V|) steps. 
These two nodes then exchange their partial sum estimates 
using binary block codes. Each partial sum can take |V|/2 + l 
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values, and can be encoded losslessly with log(|V|/2+l) bits. 
The blocklength needed for transmission of the encoded partial 
sums is thus 0(log(|V|/2 + 1)), where the hidden factor 
depends on the required success probability and the channel 
crossover probability, but not on |V|. Having learned the partial 
sum of the other clique, nodes |V|/2 and |V|/2 + 1 continue 
to broadcast this partial sum to other nodes in their own 
clique. This takes another 0(log(|V|/2 + l)) step. In total, the 
computation can be done in 0( log log |V|) +20( log(|V|/2 + 
1)) = 0(log|V|) steps, to have all nodes learn the sum of 
all observations, for any prescribed success probability. This 
shows that T(0, S) = O (log |V|). 


B. Distributed averaging over discrete noisy channels 
Example 4. Consider a network where the nodes are con¬ 
nected by binary erasure channels with the same erasure 
probability. Each node initially observes a log-concave r.v. 
The goal is for the nodes to compute the average of the 
observations of all nodes. 


For this example, Carli et al. [14] define the computation 
time as 

f (e) ^ inf {T e N : -J- £ E\{Z - Z v {t)f) < e, Vt > t} 
I' vev 

and show that 

r(e) <c 1 +c 2 l0g ^ £ 1 (53) 

log p~ l 

where p is the second largest singular value of the consensus 
matrix adapted to the network, and Ci and C 2 are positive 
constants depending only on channel erasure probability. It 
can be shown that the above upper bound still holds (with 
different constants) when channels are BSCs. 

We use Corollary 12 to derive the following lower bound 
on T(e). 


Corollary 14. For the problem in Example 4, 

f{6) - ss 2 ^ { h{Z ' Ws) +106 sm + 5 log ; “ 2 ) ■ 

(54) 


Proof: Using Jensen’s inequality twice, we can write 

t 4 E E [( z - ^ i4 E ( E i z - ^cni) 2 

1 1 vev 1 1 vev 

£ (m £ e ' z -H’ 


Therefore, |V| 1 ^] tieV E[(Z — Z V (T )) 2 ] < e implies that 
E |Z — Z V (T) | < |V|^/e for all v £ |V|, and 


\Z-Z V (T)\ > 


|V|V£' 


<5, 


Vvev,Se (o, 1 / 2 ] 


by Markov’s inequality. Then by Corollary 12, 


T(e) > T 



> max ^(1 - S) (h{Z\W s ) + log 


2e|V| v / e 



Choosing <5 = 1/2, we obtain (54). ■ 

The lower bound given by (54) states that T{e) is nec¬ 
essarily logarithmic in e _1 , which tightly matches the poly- 
logarithmic dependence on e _1 in the upper bound given by 
(53). As pointed out in Carli et al. [41], it is possible to 
prove that a computation time logarithmic in e _1 is achievable 
by embedding a quantized consensus algorithm for noiseless 
networks into the simulation framework developed by Ra- 
jagopalan and Schulman for noisy networks in [13]. 

VI. Conclusion and future research directions 

We have studied the fundamental time limits of distributed 
function computation from an information-theoretic perspec¬ 
tive. The computation time depends on the amount of informa¬ 
tion about the function value needed by each node and the rate 
for the nodes to accumulate such an amount of information. 
The small ball probability lower bound on conditional mutual 
information reveals how much information is necessary, while 
the cutset-capacity upper bound and the SDPI upper bound 
capture the bottleneck on the rate for the information to be 
accumulated. The multi-cutset analysis provides a more refined 
characterization of the information dissipation in a network. 

Here are some questions that are worthwhile to consider in 
the future: 

• In the multi-cutset analysis, the purpose of introducing 
self-loops when reducing the network to a chain is to 
establish necessary Markov relations for proving upper 
bounds on I(Z; Z n \Ws) in bidirected chains, and the 
reason for considering left-bound nodes is to improve 
the lower bounds on computation time. We could have 
included all channels from Si to S, into the self-loop at 
node i! in G', but this would result in looser lower bounds 
on computation time (cf. the remark after Theorem 3). 
However, there might be other network reduction meth¬ 
ods, e.g., different ways to construct the bidirected chain, 
that will yield even tighter lower bounds on computation 
time than our proposed method. 

• In the first step of the derivation of Lemma 2 and 
Lemma 5, we have upper-bounded I(Z; Z v \Ws) using 
the ordinary data processing inequality as 

I(Z;Z V \W S ) <I(W S c;Z v \Ws). 

One may wonder whether we can tighten this step by a 
judicious use of SDPIs. The answer is negative. It can be 
shown that 

I(Z-,Z v \Ws) < I(W S C-,Z V \W S ) 

Sup 7] | Ws—ws < ^Z\Wsc ) 

“s^iLes w u 

where the contraction coefficient depends on the joint 
distribution of the observations Pw and the function Z = 
f(W). However, 

tl(j^W<;c\Ws=WS Z\Wgc ,Ws—ws') ^ 

for both discrete and continuous observations. For dis¬ 
crete observations, this is a consequence of the fact that 
7?(P^,Py|x) < 1 if and only if the graph {(x,y) : 
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Px(*) > 0, Py|jf(y|x) > 0} is connected [26], and 
the fact that, for any Py|x induced by a deterministic 
function / : X —► Y, this graph is always disconnected. 
This condition can be extended to continuous alphabets 
[42], It would be interesting to see whether nonlinear 
SDPI’s, e.g., of the sort recently introduced by Polyanskiy 
and Wu [28], can be somehow applied here to tighten the 
upper bounds. 

• If the function to be computed is the identity mapping, 
i.e., Z = IT, then the goal of the nodes is to distribute 
their observations to all other nodes in the network. In 
this case, our results on the computation time can provide 
non-asymptotic lower bounds on the blocklength of the 
codes for the source-channel coding problems in multi¬ 
terminal networks. In Example 2, we have considered one 
such case with discrete observations, and obtained lower 
bounds in Corollary 2 based on the single cutset analysis. 
It would be interesting to apply the multi-cutset analysis 
to the source-channel coding problems in multi-terminal, 
multi-hop networks. 


Since the successive partition of G ensures that nodes in Si 
can communicate with nodes in Sj only if \i - j\ < 1 , the 
messages originating from Si at step t can be decomposed as 

XSi,t = X( S . Si ),t) 

? X(S.; ; X(Si ,dSi),t ’ Si ,Si\d Si),t )’ 

and the messages received by nodes in <S, at step t can be 
decomposed as 

Y Si ,t = 

= ^ \si,Si\*dSi),d- 

(A.3) 

According to the operation of algorithm A, for each (u. v) £ 
S there exists a mapping ip( u<v ) t t, such that X/ U>v \ t = 
By the definition of dSu we can write 

= {}P(u,v),t(Wui Y u ) : 

(it, v) £ £,u £ dSi,v £ Si- 1 ). 
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Thus, there exists a mapping tp s . t , such that 

(AA) 

where 

Y*dSi,t~ (Y(Si-i,Si),v 


Appendix A 
Proof of Lemma 7 


By the same token, there exist mappings ip Si t , (ps it t and <PSi,u 
such that 


The goal of this proof is to show that, given any T-step 
algorithm A running on G, we can construct a randomized 
T-step algorithm A! running on G' that simulates A. Fix any 
T-step algorithm A that runs on G. For each t, we can factor 
the conditional distribution of the messages X t = (X v j) v e\> 
given W, F t_1 as follows: 


A> iA+l) , t = (A.6) 

= (A.7) 

= VSiAWs^Y*- 1 ). (A.8) 

Define the random variables 


^x t \w,x*-\Y^(xt\w,x t 1 ,y t x ) 

= I°A'„, t | ( x v,t\yJv: Uv ) 


v£V 


riir 

i—l 


X v ,t\W v ,Yv 


^%v,t 


Wv,y- 


t -1 


ip 


Xsi.AWs^Y^VZSil 


-1 (xs u t 


(A.l) 


Likewise, the conditional distribution of the received messages 
Y t = (Y v ,t) v ev given W, can be factored as 


'^Y t \W,X£Y t - 1 {yt\'W,X t ,y t 

— -^e(2/e,t |*£e,t) 

n 

11 11 | K( u ,v)(y(u,v),t \ x (u,v),t) • 

2=1 uESi 2 ;GV: (u,v)€£ 


Wi = 

Xi y t (-^( 2 , 2 — l),£> -^( 2 , 2 + 1 ),tl t) 

Y i,t 1,2 ),ti ^(2+1,2 ),ti -^(2,2), t) 

— 0 (Si-1,Si),t> I ( Si,*dSi),t )> 

Ui,t = (X(Si,Si\*dS i ),vY(Si,Si\dSi),t)- 

From the decomposition of Yg. t in (A.3), we know that 
(F/- , t/* _1 ) contains Y| _1 ; while from the decomposition of 
Y'y .,, in (A.5), we know that F t_1 contains Y4~\ Therefore, 

oo z ,i i oSi 

from Eqs. (A.4) and (A.6)-(A.8), we deduce the existence of 
mappings Xp i t , ip i t , and (pij, such that the messages 
transmitted by nodes in Si at time t, can be generated as 

), t = (A.9) 

X(i,i+i),t = v^WuYf- 1 ,^- 1 ), (A. 10) 

XM,t = MW i ,Y*- 1 ,U*- 1 ), (A.ll) 

X iSi ,Si\^Si),t = (A-12) 


(A.2) 
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Note that the computation of does not involve U *~ 1 . 

Next, the messages received by nodes in Si at step t are related 
to the transmitted messages as 




K, 




> Y ( ,_ 


(*— 


K, 


(i+i.i) 


>*7 




K, 


(i,i) 






where the stochastic transition laws have the same form as 
those in Eqs. (26) to (28). In addition, since g.\gg.) t 
and Y( S g.\gg^ t are related through the channels from 5, to 

Si \ dSi, there exists a mapping such that Y ls g.^gg.) t 
can be realized as 


Finally, as we have assumed that dS/s are all nonempty, 
we can define 

Zi±Z v =MW v ,Yj) 

with an arbitrary v £ dSi. From the definition of Y l f and 
the fact that Y' r contains Yj, it follows that there exists a 
mapping ipi such that 

Z i =^ i (W i ,Y l T ). 

Using this mapping, node i' in G' can generate the final 
estimate of the chosen v £ dSi in A as Z,’, such that 
( Z, Zi : i £ {l,...,n}) and (Z,Zii : i £ {l,...,n}) have 
the same joint distribution. This guarantees that 


— *dSi),V (A.13) 

where R l: t can be taken as a random variable uniformly 
distributed over [0,1] and independent of everything else. 
From (A. 12) and (A. 13), we know that Uit can be realized 
by a mapping as 

U ilt =# itt (W z ,Y*~\U t i -\R i>t ). (A.14) 


Taking all of this into account, we can rewrite the factorization 
(A.l) as follows: 


^ > x t \w,x t - 1 ,Y t - l {x t \w,x t 1 ,y t *) 

n 

i =1 

• = Ai,t( w ii Z/i 1 1 

!{•£(*,i),i Vi > u i )} 

' ^{^(SijSiXaSi),* = Vi i u i )}i (A.15) 

and we can rewrite the factorization (A.2) as 


^ > Y t \w,x t ,Yt- 1 (yt\'w,x t ,y t x ) 

n 

2=1 

’ -^( 2 , 2 ) ( 2 /( 2 , 2 ), t\^(i,i), t) 

(u,v)££:u€:Si,v€Si\ dSi 

(A. 16) 


where the channel (g) _ Kt uv) can be re- 

(u,v)££:u£Si,vESi\dSi v ’ ' 

alized by the mapping n i t with the r.v. R i t . 

To summarize: the mappings defined in (A.9) to (A.ll) and 
(A. 14) specify a randomized T-step algorithm A' that runs 
on G' and simulates the T-step algorithm A that runs on G. 
Specifically, using these mappings, each node i' in G' can 
generate all the transmitted and received messages of Si in A 
as Moreover, from (A. 15) and (A. 16) we see 

that the random objects 


and 


(W Si ,Xl,Yl:i£{l,...,n}) 
(Wi,,XZ,Y?,UZ :i' £{l,...,n}) 


have the same joint distribution. 


maxP[d(Z’, Zi>) > el = max P[d(Z, ZA > e] 

2 'GV' iG{l:n} 

< maxP[d(Z, Z v ) > e] 

•uEV 

< <5- 

The claim that T(e, S ) for computing Z on G is lower bounded 
by T'(e,S) for computing Z on G' then follows from the 
definition of T'(e,5) in (29). This proves Lemma 7. 


Appendix B 
Proof of Lemma 8 

Recall that, for any randomized T-step algorithm A!, at 
step t £ {1,...,T}, node i £ {1 ,...,n} computes the 
outgoing messages X {i4 _ 1)>t = tp itt (Wi, U/ _1 ), X (M+1))t = 
and = MWuYf-' U*- 1 ), 

and the private message U ijt = r di, t {W i ,Y.f l .ljf 1 ,T ljt ), 
where Ri t is the private randomness of node i. At step T, node 
i computes Zi = ipi(Wi,Y^ r ). We will use the Bayesian net¬ 
work formed by all the relevant variables and the d-separation 
criterion [24, Theorem 3.3] to find conditional independences 
among these variables. To simplify the Bayesian network, we 
merge some of the variables by defining 


and 

Y i>t = (Y(m)t* 6+t.o.t) 

for i £ The joint distribution of the variables can 

then be factored as 


^w,xt,ut,yt{w, x T ,u T ,y T ) 

T n 

= W w {w) = V>i,t{wu 2 /*" 1 )} 

t= 1 2=1 

‘ '^ > Ui t t\Wi,Yf~ 1 ,Ul~ 1 (^*4 1^*) Vi ) ) 

n 

2=1 

’ P h,#wT (i+ i,i),t (B.l) 

The Bayesian network corresponding to this factorization for 
n = 4 and T = 4 is shown in Fig. 10. 

If T = 0, then Z n = i/j(W n ), hence I(Z; Z n \ W^-.n) < 
I(Z-W n \W 2 -.n) = 0. For T > 1, we prove the upper bounds 
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in the following steps, where we assume n > 4. The case 
n = 3 can be proved by skipping Step 2, and the case n = 2 
can be proved by skipping Step 1 and Step 2. 

Step 1: 

For any i and £, define the shorthand Aj<_, t = A+t^jw, 
where A/+ is the in-neighborhood of node i. From the Markov 
chain W, y„ T_1 —y X ni - t x —> ^n,T and Lemma 3, we follow 
the same argument as the one used for proving Lemma 5 to 
show that 

I{Z-Z n \W 2 ,n) < I{W i; Y^\W 2: n) 

< (l-r /n )I(W 1 ;Y^- 1 \W 2:n ) 

+ r ]n I{W 1 -,Yj-\X n ^ iT \W 2:n ). 

Applying the d-separation criterion to the Bayesian network 
corresponding to (B.l) (see Fig. 10 for an illustration), we can 
read off the Markov chain 

Wi ->• W 2 :nX-l Yt-\U n _ ht ,U n<t 

for £ £ {1,...,T}, since all trails from W\ to 

(Y*- 1 ,U n -i, t ,U n ,t) are blocked by (W 2:n , Yl~\ n _ X )), and 
all trails from (1+ 1 , U n ,t) to Wi are blocked 

by {W 2:n , Y^Z_\)- This implies the Markov chain W\ -A 
W 2 -.n,Y^Yi -> y„ T_ \ A n ^ T , since X^ n _ ljn ^ T is included 
in U n -i t T and X/ n n ^ T is included in U n ,T- Consequently, 1 

I(W i; Y^\W 2:n ) < (1 - r] n )I{Wi]Y^~ 1 \W 2:n ) 

+ 7 ?n /(fL 1 ;F I f_- 1 1 |W / 2:n)- (B.2) 

Also note that I(W\-,Y n ^\W 2:n ) < I{W i; X n ^ A \W 2:n ) < 

i(Wi-,w Mn jw 2:n ) = o. 

Step 2: 

For i £ { 1,... ,n — 3}, from the Markov chain W. Y^~i l —£ 
->• y n _ iiT -i and Lemma 3, 

< (1 - r/n- l )W;^-T 1 |W2:n) 

+ yj.?- 1 , A' (n _ i) ^ T _ i |W 2:n ) 

From the Bayesian network corresponding to (B.l), we can 
read off the Markov chain 


Wi -+ 


£ Y n —i J A^(n—i+l,n—i),f 

for £ G {1,..., T — £}, since all trails from W\ to 


{Y n _i ! U n —i—l t ti Un—i,ti X 

are blocked by (W 2:n , Y*~_ { _ 2 and all trails from 

(^n— i •> U n —i— l,t; U n — i't) 

to W\ are blocked by (W 2:n , This implies the 

Markov chain 


W x 


W 2:n ,Y ^_++ 


yT—i— t y- 
* n—i >+'■ 


T-i i 


since A( n _ l _i j „_ i ) !T _ l is included in U n -i- 1>T -i and 
A ( n -i,n-i),T-i is included in Therefore, 

W;y„ r _?|W 2: „) < (i - r ?n +/(fy i; y+-+ 1 |W 2: „) 

+ Vn-iHWi ; y^r/1 W2:») CB.3) 

for £ € {1,..., n — 3}. Also note that 

< 7(Wi; A (n _ i) ^, 1 |W’ 2:n ) 
<7(ty 1 ;Wfy B _« ) jW' 2:n ) 

= 0 . 


Step 3: 

Finally, we upper-bound I{W\\ Y 2 r_ " +2 | W 2:n ) for T > n — 
1. From the Markov chain W, K 2 “ 1 —» A 2< _ j t —> V 2 .f and 
Lemma 3, 

/(FFi;y 2 r " n+2 |iy 2: „) < (1 - r ?2 )J(VF i; y 2 T " n+1 |W 2:n ) 

+ ? ?2 77+^2+. (B.4) 

This upper bound is useful only when H(Wi\W 2:n ) is finite. 
If the observations are continuous r.v.’s, we can upper bound 
I(W\\Y 2 ~ n+2 \ W 2:n ) in terms of the channel capacity G(- l 2 )- 

I(W i; Y 2 T ~ n+2 \W 2 .. n ) 

T-n+2 

t =i 

T-n+2 

- E ( J (w / i+ (1 ,2 )lt |bL 2: „,y 2 t - 1 ) 

t= 1 

+ i(fy i; y 2 , t |fy 2: „ I y 2 t - 1 ,y (1 , 2) , t )) 

(b) T ^+ 2 

< E + Y (l,2),LA (1 ,2), t |W / 2:„,y+ 1 ) 

t= 1 

(c) 

- E ^( jY ( 1 ,2),t+( li 2) > t) 

t=l 

<c (1 , 2) (T-n + 2), (B.5) 

where we have used the Markov chain W\ —> 
W 2 .. n ,Yt\Y {1>2)tt -+ y 2)t for £ € - n + 2}, 

which follows by applying the d-separation criterion to the 
Bayesian network corresponding to the factorization in (B.l), 
so that the second term in (a) is zero; the Markov chain 
W, y ,*- 1 —> A'( 12 ) t —>• y(i 2 ),t, which also implies the 
Markov chain W\ —> A+^j, W 2:n , Y 2 ~ x —> y(i, 2 ),t by the 
weak union property of conditional independence, hence (b) 
and (c); and the fact that 7(A (lj2)it ; y (li2)it ) < 77(1,2). 

Step 4: 

Define I^ t = I{W\\ Y*\W 2 -.i ) for i > 2 and £ > 1. From (B.2), 
(B.3), (B.4), and (B.5), we can write, for n > 3, T > n — 1, 
and i £ {0,..., n — 3}, 

*n—i,T—i L T\n—iln—i£T—i —1 T Vn—i^n—i—l^T—i—1 (B.6) 

where f) rl _, = 1 — and I n -i,i = 0. In addition, for 

T> n- 1, 


'This follows from the ordinary DPI and from the fact that, if X —> 
A, B —» C is a Markov chain, then X —¥ B —> C is a Markov chain 
conditioned on A = a. 


^2,T-n+2 < 


V2h,T-n+l + V 2 H (Wi\W 2 - n ) 
C(1 , 2 ){T — Tl + 2) 


(B.7) 
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and I 2 q = 0. 

An upper bound on I(W\\Y^\W 2:n ) can be obtained by 
solving this set of recursive inequalities with the specified 
boundary conditions. It can be checked by induction that 
I(W-r,Y r J\W 2:n ) = 0 if T < n - 2. For T > n - 1, if ru < fj 
for all i G {1 ,..., n}, then the above inequalities continue to 
hold with rji’s replaced with fj. The resulting set of inequalities 
is similar to the one obtained by Rajagopalan and Schulman 
[13] for the evolution of mutual information in broadcasting 
a bit over a unidirectional chain of BSCs. With 

B(m,k,p)±(fyp k (l-pr- k , 

the exact solution is given by 

I(Wv,Y^\W 2 .. n ) 

T — n+ 2 /rp .\ 

< H(wi\w 2:n )fi e :' 2 ) 

i =1 ' ' 

T-n+2 

= H{W 1 \W 2m )r l B(T-i,n-2,ri) 

for n > 2, and 

I(W 1 -Yj\W 2:n ) 

<<?(!,2)»j E ^- 3 (i-^) T - i - n+2 ( "* )* 

i =1 ' ' 

T-n+2 

= C(i, 2 )t? E B(T -i-l,n-3,rf)i 

2 = 1 

for n > 3. This proves (30a) and (30b). 

For general r^’s, we obtain a suboptimal upper bound by 
unrolling the first term in (B.6) for each i and using the fact 
that I n -i,t = 0 for t < n — i — 2, getting 

In—i,T—i — Vn—i Vn—i^n—i—l,n—i—2 T • • • 

T 'nn—iVn—iIn—i—l,T—i—2 H” Vn—i^n—i— 1,T— i— 1 

E (+ • • • + Tjn—i + l) T]n—i^n—i— 1,T— i —1 

Iterating over i, and noting that 


^2,T-n+2 

< min {H{W 1 |W 2:n )(l - fj^~ n+2 ), C7 (1 , 2) (T -n + 2)}, 


we get for n > 2 and T > n — 1, 

/(IRi;F„ T |I+2:n) < 

f ff(wtiw 2:n ) nr =2 (i - (i - %) T - n+2 ) 

\c (1 , 2) (t -«+2) nr =3 (i - a - ^) t -" +2 ) 

The weakened upper bounds in (31a) and (31b) are obtained 
by replacing rji in (B. 8 ) with 


(B. 8 ) 


77 = max rji. 

2 = 1 ,... ,n 

Finally, we show (32) using an argument similar to the one 
in [13]. If n > 4 and T < 2+(n — 3 ) 7/77 for some 7 £ (0,1), 

then 


77 n — 3 n — 2 

77 < - < - < - < 1 

' 7 - r-2 - T-i - 


where the last inequality follows from the assumption that 
T > n — 1, since otherwise J(Z; Z n | W 2 :n) = 0. The upper 
bounds in (30a) and (30b) can be weakened to 


< 


I(Z-Z n \W 2 :„) 

' H{W 1 1 W 2:n )r,(T - n + 2 )B(T - 1, n - 2, 7 ?) 
C' ( i, 2 )t 7 (T-n + 2) 2 S(T-2,n-3,77) 


(b) 


< min C {h2) }y(T - n + 2 ) 2 B{T - 2, n - 3, 77 ) 


n — 3 
T-2 


< C , (i j 2 ) 77 (T - n + 2) 2 exp ^-2 ^ 
( |a ( 1 , 2 ) fc^ex P (- 2 g- 77 ) (n — 3) 


-77 (T-2) 


where 

(a) and (b) follow from monotonicity properties of the 
binomial distribution; 

(c) follows from the Chernoff-Hoeffding bound; 

(d) follows from the fact that the channels associated with 
8s are independent, and the fact that the assumption that 

n > 4 and n — 1 <T <2 + (n — 3 ) 7 / 77 . 
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